Tim Klawa Explains Decision-Grade AI for Critical Missions

[rsfv]

How Decision-Grade AI is Transforming Diverse Industries: A Conversation with Tim Klawa, Head of Product, Figure Eight Federal | Episode 5

ExtraMile by KnowledgeNile brings you candid conversations with leaders driving technological innovation. In this episode, Tim Klawa, Head of Product at Figure Eight Federal (Appen’s FOCI-mitigated arm for U.S. defense/intelligence) unpacks the rigorous science behind decision-grade AI so reliable, it powers life-or-death decisions. From human-in-the-loop data labeling to benchmarking generative AI models, Tim reveals how Figure Eight Federal bridges the gap between cutting-edge technology and real-world impact. Discover how their Artemis platform handles complex data types (SAR, LiDAR, multispectral imagery) and why partnerships with Azure Government and domain experts are critical for tailored defense solutions.

Key Takeaways:

Trust Over Speed: Decision-grade AI prioritizes justified confidence, proving AI’s reasoning aligns with human expertise.
Red Teaming Matters: Adversarial testing creates edge cases to ensure models perform under pressure.
Human + Machine Collaboration: Domain experts annotate data to codify decision-making logic for AI.
Edge AI’s Future: Offline, reasoning-based models will revolutionize DDIL (denied/disconnected) environments.
Agile Development: ICAA award-winning approaches optimize AI training pipelines for rapid retraining.

About Our Guest

Tim Klawa

Tim Klawa is a technology leader specializing in product innovation and the accelerated adoption of cutting-edge technology for defense and intelligence clients. With extensive experience in the intelligent transportation systems industry, Tim has held key positions focused on providing technology consulting to government clients. He has led product efforts at a globally renowned company, Kapsch, driving advancements and adoption of connected vehicle technology, data mesh, AI detection systems, and advanced decision support systems. Tim leads the Figure Eight Federal (F8F) product portfolio, specializing in supporting complex AI use cases and enabling key integrations and partnerships across the defense and intelligence sector. His expertise and vision have been instrumental in transforming complex technological challenges into streamlined, impactful solutions for F8F clients

About Company

Figure Eight Federal

Figure Eight Federal (F8F) is Appen’s FOCI-mitigated arm dedicated to serving the U.S. defense and intelligence sectors. F8F delivers accurate, reliable, human-annotated datasets that power critical AI and machine learning applications. With native support for a wide range of datatypes—including Synthetic Aperture Radar (SAR), LiDAR, and multispectral imagery—the Artemis platform enables labeling directly on data in its original form. The platform also supports benchmarking and fine-tuning of generative AI through human-in-the-loop evaluations performed by domain experts, while integrated generative AI capabilities help mitigate the programmatic risks inherent in large-scale data labeling efforts. By leveraging Azure OpenAI Service within Azure Government, F8F offers a powerful solution for unlocking domain-specific, optimized generative AI innovations tailored to defense workflows.

Transcript

Host: Hello everyone, welcome back to another episode of ExtraMile by KnowledgeNile, where we bring our insights from the leaders shaping the future of technology. I'm Sudakshina, your host, and today we have a truly fascinating conversation lined up for you.

For today's interview, we are delighted to have Tim Klawa, Head of Product at Figure Eight Federal, a company at the forefront of building decision-grade AI for high-stake industries like defense, healthcare, and finance.

Their technology transforms raw data into actionable, reliable insights and makes sure that critical decisions are supported by the highest-quality AI models. From geospatial intelligence to secure data labeling, the company empowers federal agencies and enterprises to operate with precision and confidence. Tim, with his great experience in AI and product innovation, has played an important role in driving solutions that bridge the gap between cutting-edge technology and real-world impact.

Let us discuss how AI is transforming mission-critical environments and the future of edge computing.

A very warm welcome, Tim. It's a pleasure to have you with us today. How are you doing?

Tim: Very good, thanks for having me.

Host: Thank you so much. So, Figure Eight Federal works with sectors like defense, health, and finance to build decision-grade AI. What does that term mean to you, and how do products like Artemis and Hydra AI turn raw data into actionable insights for these high-stakes industries?

Tim: Yeah, for sure. So, our focus as a company is really about how to make AI useful to accelerate the mission of our clients, regardless of what sector they operate in. And at the end of the day, that really comes down to one simple binary attribute.

Can you trust AI with the decision? So, the reason we talk about decision-grade AI is holding ourselves accountable, if you will, to enabling our end clients to have that comfort and that trust in the system that we deliver. And so, just to kind of give you an example of that, a lot of times AI systems are somewhat of a black box.

You don't understand how exactly it's arriving at its decision. And so, we specialize in the development of AI training data, really about how do we map the requirements of a given mission space to the actual data that we're training our models on so that individuals can see very early on the different types of edge cases or what-if scenarios and how that gets played out in the development of those AI models for things like defense or health or finance.And what that enables us to do is really remove a lot of that ambiguity in the development of new models to make sure that the end operators or the analysts or the users that are using an AI system don't just have to blindly trust something, but they can trust something with what we like to call justified confidence. And it's all about enabling AI to partner with those analysts and operators to help them make better decisions.

Host: That was very fascinating to know, Tim. So, your work focuses on federal AI initiatives where mistakes aren't an option. How do you ensure the data labeling and models powering the systems are accurate enough for life-or-death situations or decisions?

Tim: So, I guess to start with, I kind of like to look at this in the reverse. There are so many industries and sectors that could use AI where really the speed of a decision is almost as important as the decision itself. And so, just to kind of give an example, think about driving, for example.

You're driving down a highway, and then some catastrophe happens, and you really have a split second of how to react in that scenario to either prevent or cause an accident or even the magnitude or scale of that accident. And so, kind of as a hypothetical, of course, it would be better right if you could sit down with physicists to understand the different trajectories of the different vehicles involved and how to minimize the risk of a massive pileup. Being able to sit down with analysts to determine that if you turn one way, what's the likely reaction of the other driver in the other car to make a response that would be beneficial or complementary to your decision.

But at the end of the day, the amount of analysis and the amount of analyzing the situation is perfect for the development of a report from two weeks after an event saying, hey, this is what would have been the best approach. But in the actual live environment, that type of manual process doesn't get you anywhere. And so, what I like to think of as AI systems supporting decision support systems is really that we like to say they have to be perfect, but I like to look at it from the perspective of, is it better than the alternative of having a manual approach?

And as long as AI can assist with making a faster, better decision, being able to bring in more inputs and more comprehensive analysis, it allows you to reduce your chances of failure, even if that AI system isn't 100% correct. But, you know, on the other hand, like what you said, especially in defense and Intel, you can't be wrong. You have to be not just better than the human alternative, but way better than the human alternative because the cost of making mistakes, especially from like a liability standpoint, is so much worse.

And so, that's one of the things that we take as kind of a core aspect of our company is really around how do we mitigate risk in the development of AI training data. And that comes down to all sorts of things like adversarial testing, which we call red teaming, which is around how do you basically throw curveballs to an AI system while you're developing to basically play out all those scenarios. So, taking that analogy from like a potential wreck, kind of looking at it when you do have the time, which is the development of that AI training data, instead of your physicist or your behavioral analyst, right, analyzing the scene after a wreck, we take those experts and we use those experts to do those what-if scenarios in the development of AI training data, really to determine how to capture the reasoning process or the decision-making process from experts, and then codify that into AI training data so that AI training data can mimic the decisions of a human, but do it faster, which is critical in a lot of these use cases to assist speed of a decision is what's highly important.

Host: Fantastic. That clearly states why your clients rely on you. Jeff partnered with groups like Latent AI and Enveil. How do these collaborations accelerate Figure Eight Federal's mission? Can you share an example of a partnership that directly solved a problem from a client?

Tim: Yeah, for sure. So, we're partnered with a lot of different companies in the defense and intel space, and then also in the commercial sector. Our parent company has partnerships with, you know, a wide majority of the Fortune 100 companies, and really it comes down to, especially in a complex field like AI, that it takes partners that are experts in different areas to all contribute, especially when you're looking at AI augmented systems, like a decision support system or something of that nature.

Because of the complexity, especially in the defense sector around different weapons systems, you're going to have a wide variety of companies that are specialists in different areas. Like you mentioned, Latent AI, they specialize in being able to optimize models for edge computing. Enveil specializes in how you can have better encryption between different domains to protect your data privacy as you flow data between different systems.

And so, as we approach those larger holistic environments in the defense and intelligence sector, and then also on the commercial space in a wide variety of different industries, from health, transportation, to investment, banking, and whatnot, it all really comes down to who you're partnered with. And so, it's something where I think, as a company, we don't necessarily strive to be the expert in every single different domain, but to be the trusted expert in the development of high-quality AI training data, and then find those other partners that we can collaborate with to deliver the best possible solution to our end clients. And so, it kind of brings down to the culture of who we are as a company about building interoperable systems that are modular and allow our clients to plug and play with a wide variety of different vendors.

But just to kind of give you an example of a partnership that has directly solved a problem for a client would be a recent project we were involved with for the DoD, where we ended up showcasing with a couple partners how we could integrate native PaaS services into our labeling platform to develop a more holistic, rapid retraining pipeline. And that basically allowed us to showcase how you can quickly evolve your AI models for different domains, which becomes highly important when you're dealing with an operational landscape, like a kinetic environment in a war zone, for example, where things are never constant. And so, in order for AI to be relevant, you have to be able to rapidly retrain and iterate that model very quickly.

Host: Definitely. It's amazing to learn from you how collaboration can accelerate innovation. See human domain monitoring service tracks real-world activities. How do you balance AI automation with human oversight, especially in sensitive areas like defense or healthcare?

Tim: Yeah, for sure. So, basically, that kind of comes down to our Hydra AI product portfolio that we specialize in looking at the patterns and movements of large groups of people around the world. And really, it's around how to look at trends at a macro level that can benefit individuals at a micro level.

And so, just to kind of give an example of that, in the transportation sector, when you look holistically at how individuals move through a city as policy and decision makers or individuals who are looking at the next 10 years of that city, you may want to understand, like, how do I develop my next transportation network to be more effective to the individuals who live in this city? Where should you put your next metro stop? Or how do people do the first, last mile destination when they're going to work during the work week?

And so, by looking at those trends of where and how people move, we can determine patterns and look for anomalies. So, it allows us really to not necessarily replace the analyst or the human, but really enable the human oversight to have more situational awareness. So, like, for example, even in healthcare, this can come down to how early can you predict whether or not there's a potential virus or pandemic that's starting to occur and how that could potentially spread in order to mitigate the effects as quickly as possible.

And so, looking at those trends of seeing disruption patterns in human movement or, you know, accelerated occupancy at different hospitals, at universities, or things of that nature, allow you to flag those early trigger warnings for analysts to be able to use their expertise and their medical background or their transportation background or whatever sector that they're involved with to make more informed decisions.

So, I would say that the human domain monitoring is less about completely, you know, developing automation to make decisions and more around how do you highlight some of those things that typically go unnoticed or require policymakers or subject matter experts in a field to do a lot of guesswork and provide them the type of data that allows them to make decisions with more confidence.

Host: Amazing. So, you have said you take breaks down knowledge silos. How does Figure Eight Federal convince traditional sectors like intelligence to share data and trust your platform with it?

Tim: Sure. So, I think what we like to talk about of breaking down knowledge silos is less about how to share data and things of that nature and more around what we like to call breaking down knowledge silos from a human perspective. And so, we focus in the development of AI training data, which is really the science of capturing human intelligence in a form that machines can learn from.

And our mission is really about breaking down silos that exist within organizations or within different groups of individuals around domain specific knowledge that certain individuals have and making sure we capture everyone's unique input to make AI better. And that's something that I'm super passionate about because I firmly believe that every single person has some unique attribute or some unique skill that is critical to enhancing the next generation of AI models. And really, that comes down to the frontier data around, especially as AI models and generative AI particularly becomes more and more powerful, becomes data hungry.

And we need to find better and more original data from human experts in order to accelerate the next generation of AI models. And so, all of our technology is really designed around how do we measure the performance and how do we determine what somebody's good at so that we can assemble those teams of individuals with different expertise or different backgrounds or unique inputs to accelerate the ability of AI to make better and more informed decisions.

Host: Fantastic. So, with tools like Gen AI Optimizer, you're refining models for field use. Where do this see the biggest potential for AI and AI computing over the next five years? Is it warfighters, disasters response, or something unexpected?

Tim: For sure. So, basically, our contributions with regards to generative AI come in kind of two different areas. One, we help with fine-tuning LLMs to be able to make them more appropriate for some niche use cases.

And then, we also focus really heavily on adversarial testing of AI systems and evaluation of LLMs that allow someone to determine whether or not a particular LLM is suited for their given use case and look at the risks. And so, they can go in knowing how to develop the appropriate guardrails or the appropriate end application systems that individuals are going to interact with to make sure that they enhance the ability of augmenting humans with AI into mission success. And so, I guess I would say that I think with regards to edge computing over the next five years, I think that it's going to have a huge impact with the future of reasoning-based foundational models in really having access to reasoning capabilities in DDIL or denied, degraded, intermittent, or limited bandwidth environments.

And I think that as we approach faster compute capabilities on the edge, as well as smaller, more niche reasoning models that use less compute, we're going to see the ability over the next five years for these models to increasingly accelerate their ability to do reasoning, and at some point, outperform the ability of a human to reason on something to such an extent that I think it's going to become part of our daily environment in those edge environments where we may or may not have connectivity back to a central server farm, but the need to be able to have AI-augmented decision support in the battlefield or in a disaster response where you may not have internet connectivity. So, I think it will be less around one particular sector and more around the ability to have kind of these offline models that just become part of our everyday decision-making process.

Host: That was awesome. So, your team won a 2025 ICAA Award for Work in Agile and Earned Value Management. What is one lesson from that project that changed how Figure Eight Federal approaches product development?

Tim: Yeah, for sure. So, basically, we won that ICAA Award in partnership with Training Data Project and Technomics. Training Data Project come in a little bit more from the policy standpoint and the frameworks, us coming as the developer of AI training data, and then Technomics from kind of the program management and cost estimation standpoint, and each of us came from it from a unique perspective, contributing each our own angle, and really the focus of that research paper that we put into ICAA this year was around how to look at removing some of the surface-level metrics and get deeper metrics and understanding around how to optimize from a costing perspective and from a program management perspective the development of larger AI systems.

And so, a lot of times in large AI programs, there's a lot of guesswork from individuals making decisions around different components of the AI lifecycle where they may not be able or have the understanding of nuanced things within different elements of the AI lifecycle that allow them to take a data-driven approach to their decision making. And so, our focus in the paper was how to measure the diminishing returns to different elements of your AI pipelines, how to look at where investment would be most strategic to removing a bottleneck or unleashing AI innovation in a more accelerated fashion. And so, we're super excited about being able to collaborate on that front and then also win an award on that.

But I think at a larger viewpoint, we're excited to be able to contribute to kind of the standards and framework or philosophy around how to leverage things that we've learned in the commercial sector from dealing with some of the largest AI system developments on the planet to accelerating that within federal programs, ensuring that the government's getting what they paid for and that the government knows how to use the particular levers that we talk about in that paper to unlock AI innovation.

Host: Perfect. So, was there any specific moment in your career or a challenge or breakthrough that made you think that this is what I want to do? Can you share us your story of it?

Tim: For sure. So, I think I was focusing, I come from a background in intelligent transportation systems focusing on things like integrated corridor management systems and large data systems for smart city applications. And I switched over to the defense and intel space primarily to focus on development AI training data just because I saw how critical getting the training data was to developing better AI models.

And so, I think that moment for me would be on one of the recent research projects I was involved in recently. I was able to, I remember the first time I was able to demonstrate how to measure the impact of various AI training data characteristics before I trained a model. And that allowed me to train a model on 60% less data while arriving at a model that was 5% more accurate than training on all the data.

And especially when you look at some of these really large models around how much compute is necessary to train that, the ability to showcase the correlations of how important determining the value of every single data element going into your AI model was to accelerating not only the accuracy of that model but reducing things like the amount of data you had to collect or the amount of compute was really a hot moment, I think, to me and made me kind of validated why we were focusing on this space so heavily.

Host: That was wonderful. So, with that, Tim, we end our Q&A round. Thank you so much for sharing your actionable insights and a bit of your personal journey too.

It is clear why Figure Eight Federal is a trusted name in AI and the leadership is a big part of that. Thank you.

Thank you, everyone, for joining us today. I am your host Sudakshina signing off. See you in the next episode of ExtraMile by KnowledgeNile with the next extraordinary leader on board sharing their thoughts and knowledge.

Stay tuned.

Tune Into Our Other Informative Interviews:

How Travel Technology APIs and AI Solutions Are Transforming the Hotel Booking Industry? Ft. Abhinav Sinha, Co-Founder and CEO, ZentrumHub | Episode 4

Designing User-focused and AI-powered Shipping Solutions for Business Advancement: Insights from Ching Pei, Vice President of Product at EasyPost | Episode 3

AI, AI Platform, Artificial Intelligence, Generative AI, Industry Leaders, Interview

Key Takeaways:

Explore More Insightful Interviews