Web browser icon
Web browser icon

Beyond the First Answer: How the Expert Motion Drives Real Expertise in LLMs

Learn how to leverage LLMs as knowledge organizers in cybersecurity rather than opinion sources. Industry insights on AI’s practical applications.
Web browser icon
6 min read

At Arctic Wolf, we believe the future of cybersecurity is built on AI guided by human expertise. Staying at the forefront of security operations means not just adopting new technology but deeply understanding how and where it should be applied.

In this blog series, Kenneth Ray, SVP and Chief Innovation Architect at Arctic Wolf, along with other Arctic Wolf experts and leaders from the wider security community, share their perspectives on large language models (LLMs), what they are good at, where they fall short, and what it really takes to use them responsibly in real-world security operations.

These posts offer a candid look at technical perspectives from across the industry and contribute to an open conversation about the future of AI in cybersecurity. In this post, Kenneth shares his thoughts on how we can directly use LLMs to organize our thinking and knowledge rather than asking them directly for their opinions.

Introduction

Every day, security leaders make decisions that determine whether an incident becomes a headline or a footnote. Those decisions are rarely about lacking information. They’re about judgment:

  • Which alerts matter?
  • What to investigate first?
  • When to escalate?
  • What can safely wait until morning and what can’t?

Modern systems are getting very good at producing answers. Fast ones. Confident ones. The trick is, in high-stakes environments, the nuance of the answer is far more important than the grace, clarity, or speed of that answer.

What matters is whether the recommendation reflects real-world experience. Whether the system can explain why it’s leaning one way rather than another — and show the evidence behind it.

At Arctic Wolf, we believe the next phase of security operations isn’t about replacing analysts; it’s about amplifying expertise.

In this series, Arctic Wolf leaders and members of the broader security community explore what it takes to deploy intelligent systems responsibly inside real SOCs where automation is measured not by novelty, but by outcomes: shorter and more efficient paths to detection, faster containment, and decisions executives can defend.

Let’s focus on one principle that sits at the center of that approach: Using this new tech to capture, apply, and vastly increase the total amount or experience applied to problems.

A Right Question is more than half the answer

Ask any of today’s popular large language models (LLMs) a question that requires expertise, and you’ll likely get a very good answer. With access to the whole of the internet — published research, countless articles, and blogs from armies of experts and influencers all vying to be subject matter experts — the model is primed to give you a confident, well-reasoned response, often complete with links to back it up.

These tools are also becoming exceptional dialogue partners, especially in helping craft the right question, which is often the hardest part of any inquiry. We’ve all seen this in action. Anthropic’s ads highlight this well: a thinking partner, that is no longer bound to answering questions. LLMs are also brilliant at discussing nuance, implications, and any follow-up questions.

However, LLMs answers diverge when asked for judgment or strategy, yielding multiple, well-reasoned yet distinct perspectives. This variability reveals the essence of expertise itself, which is informed interpretation rather than uniform truth.

The Creative Strength of Variability in LLMs

While at first glance this might seem like a flaw, this open-ended and creative variability is one of an LLM’s greatest strengths. It’s almost as if it’s able to read the context behind your questions, digest the gist of your intent, and then canvas a whole universe of possible responses to deliver one it intuits you will find useful. And then iterate from there.

This means you don’t have to know exactly how to ask the question. You can be vague, relying on context and conversational cues. For instance, I could ask:

“I just heard on the news about an attack that was hitting unexpected things; hospital beds could have been mentioned, I’m not sure. Do you know what I’m referencing and what I should do about it?”

And get back a fantastic answer that cuts through all that ambiguity:

“It sounds like you’re referencing [multiple attacks] …. The average hospital bed can have 10–15 connected devices, making them a potential entry point for ransomware or disruption campaigns. … Attackers are increasingly hitting smaller hospitals and clinics with modest resources, because they often have weaker cybersecurity defenses.“

Then, with high confidence, it could provide six things a CISO should do and offer to create a draft executive briefing on the topic.

This solves a problem that has plagued search for decades. When Google first gave us the ability to search our own email, it felt groundbreaking. Except, you had to know exactly what to search for. What was that person’s name? Did the email arrive in the last two years or three? What specific words did she use? Finding a specific quote was tedious. That problem is now fixed.

(As an aside, the variability of LLM responses is a fascinating topic, and it underscores the need for human oversight. See The Case for Eyeballs-Near-LLM Usage to explore this further.)

Fact vs. Advice: A Tale of Two Answers

This variability, however, has its limits, and understanding them is key to using these tools effectively.

If you ask an LLM a purely factual question — like the distance between the Earth and the Moon — you’ll get a variety of correct answers, depending on which context fits best (e.g. closest center to center, average surface to surface, the full range, etc.) While most facts are repeated across the internet consistently, most opinions are not. When you ask for opinions, you get a variety of answers.

A simple question of “I just got an alert that malicious software was blocked on my computer, what should I do now?” will return different sets of next steps and priorities. Sure, most of those recommendations overlap, but is that good enough? Not for a critical mission. The fact that there are differences means you have to spend extra time and energy tuning, judging, etc. This is especially true when automating a SOC, where a missed investigation step could result in overlooking an active low-and-slow attack.

How the Expert Motion Factors Into LLMs

Explicit use of expertise is the next evolution of this concept: fusing the instinctual human motion of applying experience with the strengths of LLMs. AI expert skills extend the reasoning capabilities of LLMs by directing them to grab the relevant data, rather than synthesize an answer.

The Expert Motion

When talking to an expert (advisors, doctors, lawyers, etc.), the conversation usually follows the same three core steps:

  1. Understanding the problem: The expert asks you informed questions to tease apart the nuance; finding the important details that you may not have known were important.
  2. Retrieving the relevant experience: The expert says: “I’ve seen something like this before,” and discusses with you how and why those memories were selected. It could be that you go back and forth a few times until you agree. “Yes, those past experiences would be great.”
  3. Applying that experience: The expert then applies that experience to your specific situation, explaining what has worked in similar cases and what has not. They outline likely scenarios, estimate potential outcomes, and recommend actions that can improve your chances of success. Most importantly, their experience helps distinguish between competing possibilities, guiding the next investigative or diagnostic steps needed to narrow uncertainty, refine understanding, and arrive at the most informed conclusion about what to do next. ‘Based on experience’, the expert can enumerate (a) the possible outcomes, (b) investigatory steps that will discriminate between them (and those that won’t), (c) useful actions (and those that aren’t).

Opinions show up only in step 2 — specifically, the opinion of the expert of what is ‘the relevant experience.’ That type of opinion is relatively easy for a ‘problem haver’ to understand and evaluate. We’ve all had experiences discussing these matches with our advisors.

There are no opinions in step 3. Step 3 relies on accumulated empirical evidence and labeled outcomes rather than analyst intuition and raw intelligence alone.1

Fusing this motion to the strengths of an LLM is straightforward, assuming you have:

  • Sufficient relevant experience captured in your historical data
  • Each “line item of experience” properly tagged
  • The eventual outcomes back propagated

LLMs are fantastic at teasing out the nuance of the problem (step 1) and at finding insightful, valuable matches (step 2).2

They can also discuss with you how that data matches and update those matches until you both agree they’re the right matches. Finally, after the relevant data has been selected, they can discuss that data. In fact, they will be able to grab far more experience than any one human could ever store in their head.

Instead of relying solely on probabilistic text generation, LLMs are trained to directly use the data to extract the cause and effect of millions of line items of experience.

Various reinforcement learning and human-augmented methods help us apply expertise to new security use cases with improved results as measured by accuracy, precision, recall, and contextual reasoning — in other words, we can marry humans and AI to advance performance.

The Expert Motion transforms variability into informed consistency — leaping past producing plausible answers, to provably useful guidance backed by hard data, statistics, and transparent reasoning.

Whether prioritizing alerts in a cloud security stack, optimizing a new integration, or evaluating customer risk posture, an LLM can cite empirical evidence from comparable scenarios, quantify likely outcomes, and explain its logic. What emerges is not another chatbot, but a system capable of reflecting both the breadth of accumulated expertise and the clarity of scientific validation — an expert that both reasons and proves.

This distinction gets to the heart of what we look for in an expert.

Options from an Expert Advisor

“I don’t want a single answer. I want options.”

— Arctic Wolf CISO Adam Marrè

What most leaders truly want is not just a raw list of options. What they’re really seeking is an incisive and measured analysis. More important than the pros and cons around those options, is the ‘why’ the expert is making those claims — and what the learnings extracted from the previous experience are.

A true expert doesn’t just dump a mountain of information on you. They listen to your problem, consider the context, and ensure that they understand your situation. Then they match their experience to what you’re asking. When you hear them say “this reminds me of xyz,” what they are doing is applying their lived experience to this new situation. More importantly, they’re looking to validate that the experience is indeed a match, working through all the rounds of “no, not exactly, here’s why what we have here is different” until the best matches are found.

As you walk through the nuances of the problem and how and why the previous experience applies or doesn’t apply, you will wind up with a curated set of viable, vetted options. They’ll say,

“Here are the three paths forward. Path A is likely the fastest; these were the risks identified, here were the mitigations (needed or not), and these were the set of outcomes matched to those line items. Path B is more robust, requires more resources and likely will have this set of friction when you roll it out to your execs. Based on our goals, I’d lean toward Path B, and here’s why.”

When the follow-up questions come, (“why did you identify this risk,” “could you give me examples,” etc.) that experience — the statistics, the nuance — is right there. Drill in as deep as you want.

This is the level of interaction we should aspire to have with our AI tools. We shouldn’t treat them as a vending machine for answers but as a tireless, creative, and knowledgeable consultant.

The Conversation Changes Completely

No longer are we wondering which of these two different chat bot answers are the best, how much trust should we give them, or what validations do we need to do.

Just like talking to a trusted advisor, we want to know how much experience, how well the matched retrieved experience to our problem, and what conclusions she can draw. In other words, our intuition is works very well in assessing both here experience and her expertise

Fusing LLMs to explicit experience using The Expert Motion unlocks the magic:

  • First, it’s in the dialogue; the ability to explore a problem space, weigh different perspectives, and refine our own thinking with a partner that has digested more information than any single human ever could. This is where today’s LLMs do well and are improving.
  • Second, it’s in direct access and interaction with the experiential data, and the ability to find and discuss the very specific experience brought to bear.

How can the advisor, with conviction, give those three options and identify the nuances and preconditions? It’s because they are referencing their prior experience. It is in that specific experience — what they have seen before and how it applies to this new question — that gives both the credibility and basis for that nuance.

Now imagine that advisor with perfect recall and access not to 20,000 or even 40,000 hours of experience, but to ~4 million. The conversation changes completely!

Instead of giving synthesized ideas (requiring you to follow the footnotes to see how accurate that synthesis is and what may have been missed or misquoted) it sounds more like:

“Yes, I’ve seen something like this before. In 200 cases, the pattern followed one path. In 300 others, it unfolded differently. Here are the highlights. Shall we discuss the nuance?”

While it is comforting to hear confidence in your expert’s response, it’s far more practical to be able to drill into the details, see the statistics, understand how those conclusions were drawn, and see how that experience applies to your own situation.

No opinions, no guesses; only the evidence of what matched, what didn’t, and the statistical correlations and outcomes that support the conclusions.

This doesn’t change that fact that you still need validation, reproduction, independent agents self-orchestrating, confidence scoring, regression testing, benchmarking, and all the rest. And what it does do is…

The Expert Motion moves us from debating abstract answers to interrogating concrete evidence and prior cases. ”The goal is not to get the LLM to think for us, but to help us think better.” To surface patterns, comparisons, and probabilities drawn from experience far beyond any individual’s reach. That’s what it means to use The Expert Motion.

1 – Note: It is highly likely that a super intelligent analyst , or for that matter a super intelligent agent, will absorb, at an incredible rate, the direct lived experience, as well as absorbing the experience of all her coworkers.

2 – Anthropic research just highlighted how, when they directed one of their recent AI models towards some of the “most well-tested codebases (projects that have had fuzzers running against them for years, accumulating millions of hours of CPU time), Opus 4.6 found high-severity vulnerabilities, some that had gone undetected for decades”

Share this post: