The working relationship between the Domain Expert and the AI Engineer is possibly the most important component of a successful AI project. But what does a successful working relationship involve?
First, there has to be a working relationship - you have to talk to each other! It’s nothing flashy, but regular, open dialogue is more important than the flashy stuff (and lack of communication befalls many AI projects1). But more than this, you should want to talk to each other. A good outcome from an AI project reduces drudgery for the domain expert. Equally, the AI Engineer gets to learn something new about how the world works, which, at least for us at Artanis, is extremely satisfying.
Who is a Domain Expert?
Domain experts can be anyone! What matters is knowledge of the specific process or task we are looking to automate with AI, particularly knowledge about the "correct" outcome of a process or task.2
For example:
If the task is extracting data from invoices for entry into accounting or ERP software, the Domain Expert is the person who knows what a "correct" extraction and entry is: Which fields are essential? Which are ignorable? Which are almost always essential except on invoices from that one really big customer, where we figure out that field ourselves? This person might be a financial controller, or the CEO, or the person who works in customer success. Job titles aren't always useful here.
If the task is grading English essays, then the Domain Expert is someone with deep knowledge of the grading criteria, how they should be applied, and experience in doing the grading. This is very obviously the English teacher. But if we are being really picky, we might prefer it to be a specific, expert grader who instructs other teachers in their marking.
It's crucial that the Domain Expert and the AI Engineer work closely together to overcome the curse of knowledge - the domain expert is so familiar with the task they have forgotten what is necessary to learn it.3 A Domain Expert can never write down an exhaustive description of the information needed for the task, nor a complete enumeration of the decisions they take to get to a "correct" solution to the task. They cannot do this because they're biased by things that are obvious to them, but not at all obvious to anyone else. It's only by working closely together, following the process described below, that the AI Engineer can elicit all the tacit assumptions and contextual knowledge from the Domain Expert.
Example - Medical report review by cardiologist
Consider Domain Expert Chris, the cardiologist.
Chris's task is to review reports from implanted heart monitoring devices, summarise the reports, decide if an additional alert for intervention is required, and how severe this alert should be (go to the ER now vs. review this with your physician at your next appointment). Here, “correct” involves not just interpreting the information in the report, but making different decisions based on patient medical history. Even in instances where the medical decision is the same, Chris knows to write reports for different hospitals in a different format, as they each have format and information preferences.4
The AI Engineer's job is to be an inquisitive sponge. That is, to be completely unfamiliar with the task (initially) so that they can ask as many questions as possible, primarily "What information are you using to make that decision?". There are always tacit assumptions in any task with domain expertise. Extracting these assumptions is something only doable by spending time observing the Domain Expert perform the task.
Once the AI Engineer has an understanding of "correct", they build a processing pipeline with an appropriate mix of software and AI to produce "correct" solutions to the task. Instances where the AI and the Domain Expert disagree are very important – they highlight areas where 1) the AI is inaccurate and/or 2) there's a new tacit assumption or previously unconsidered piece of information that influences the "correct" solution to the task. Lastly, these disagreements occasionally highlight errors in the ground truth, “correct” data, which subsequently need updating. This process of comparing AI task solutions and "ground truth” Domain Expert solutions is called disagreement review.
These disagreements can be very silly or obvious things to the Domain Expert! "This person needs to see their doctor because, whilst their heart is perfectly healthy, the battery is low and needs replacing", or "We can ignore this report because it was from the day the device was activated, and this [very unhealthy looking signal] is actually from before it was implanted into the patient". All very obvious things to the Domain Expert, all completely non-obvious to AI nor the AI Engineer.
The working relationship involves the following phases:
We observe, on a video call, Chris performing the task (reviewing reports) and talking out loud about the reasons for certain decisions, with the AI engineer asking clarifying questions. We also review these recordings repeatedly to understand the decision making process in its entirety.
We then build an AI processing pipeline to perform, as best it can, the task as we understand it. This pipeline might initially simplify aspects of the task for the AI, to quickly get AI outputs back in front of Chris.
This pipeline runs in parallel to Chris for a short period, highlighting areas of disagreement.
We present these disagreements back to Chris, which further illuminates the information needed for AI to make a "correct" decision.
We improve the AI and processing pipeline using Chris's feedback, and repeat steps 4 and 5 until AI performance plateaus.
Choosing an appropriate, stable task
One pivotal moment, that precedes everything in the Domain Expert <> AI Engineer relationship, is the choice of task to automate with AI.
Appropriate tasks must be stable enough for AI. Task stability means the inputs (e.g. report types, formats, additional sources of information) and the outputs (report summaries, alert decisions and levels) remain static. LLMs have expanded the scope of what it means for an information source to remain static. Previously, small changes in formatting and structure of a document could break a software processing pipeline; LLMs are more robust to these structural changes, so long as the information is still present. As a technical heuristic, if you can assemble an evaluation data set and/or can feasibly monitor simple predictive model performance in production, the task is stable.
Tasks must be stable for AI to be useful. Non-stable tasks are unlikely to be automatable enough to justify the time/money investment. This is because non-stable tasks don’t generate enough of the same type of data5, and/or because they’re so infrequent that it’s not worth automating.
Ideally, the Domain Expert and AI Engineer jointly choose a specific, repeatable and stable task, which is a subset of the Domain Expert’s general responsibilities (In the previous example it was report reviewing). Sometimes the choice of task is made externally, by leadership or by parties external to the company. This can go either way: sometimes injecting a fresh perspective, or pushing a team to try a radically different approach to a problem, is helpful; sometimes it’s ill thought out and doomed to fail from inception because the data doesn’t exist, or the AI approach is structurally/technologically impossible. Avoiding the latter of these settings is essential to building AI that actually works.
Lastly, there are technical reasons that make task stability an essential component to AI project success. The most important of these is that it lets the AI Engineer connect the output from disagreement review, where AI task solutions and ground truth “correct” solutions are compared, and data labelling more generally, to improved model performance. This connection lets us make the Domain Expert the AI model owner.
What does being a “model owner” mean for a Domain Expert?
Imagine we’re in the latter stages of the AI project. We’ve been through disagreement reviews, we’ve observed and recorded the task being performed, the Domain Expert has answered so many obvious questions that they cannot conceive of any more bizarro edge cases that could possibly arise. Are we done? Sort oftm!
We aren’t done6, but this is the moment that model ownership can transfer to the Domain Expert. It’s not unreasonable to assume further model maintenance and improvements depend on the AI Engineer, but as we've seen, it all comes down to how well the expertise of the Domain Expert is extracted and embedded into the model. Through the process of labelling and disagreement review, and working in partnership with the AI engineer, we have constructed a way to extract expertise systematically. By using the platform we’ve built to facilitate labelling and/or disagreement review, the Domain Expert can continue to refine the model without needing input from the AI Engineer. This is how we define model ownership: the ability to refine and improve the AI model, and by doing so the Domain Expert makes themselves ever more efficient.
What every successful Domain Expert <> AI Engineer working relationship has in common
There’s no single best way to be a Domain Expert in an AI project, but there are properties of every successful AI project and every AI Engineer and Domain Expert relationship:
The Domain Expert decides what "correct" is for the task. It’s important that there’s only one definition of “correct”. Some tasks will have multiple solutions, especially if there are multiple potential Domain Experts for the project, and they might not agree with each other!
The AI Engineer and the Domain Expert regularly meet whilst the Domain Expert performs the task (and the AI engineer asks a lot of questions) to minimise the curse of knowledge.7
The AI Engineer builds AI and surrounding software to enable automation of the task.
Disagreements between AI predictions and labelled, ground truth data provided by the Domain Expert are reviewed together.
There’s a shared, joint understanding that the working relationship is symbiotic – the AI Engineer can’t do the Domain Expert’s job, and can’t build anything that eliminates drudgery for the Domain Expert without their continual input.
The AI Engineer quickly builds AI that actually works (and genuinely reduces the drudgery in the Domain Experts job!), even if this is initially very scope-limited.
The most important thing is that there has to be a working relationship. Project success depends on AI Engineers and Domain Experts talking to each other (rare though it is), and personally, I find it to be the most interesting part of my job!
It’s not just us who think this, much smarter people like Chip Huyen say the same thing.
Sometimes there is more than one "correct" outcome. In this case we're really interested in the Domain Expert's preference. See “What does good enough look like?”.
Imagine you are teaching a course about how to use Google Docs. You might imagine you need to teach about the contents of the "Edit" menu, or the most useful keyboard shortcuts. You ask the class to open an empty google doc in their web browser of choice to start teaching these points, when a class attendee asks "What's a web browser?"
Bikeshedding is not just a software engineering affliction.
Humans still learn much quicker and off many fewer data points!
Data drift, gradual covariate shift, and concept drift all take hold over sufficiently long timescales.
There is, of course, an XKCD for this: https://xkcd.com/2501