Yet another Hacker News

73 points mahmoud-almadi 1 week ago 60 comments

Hi HN, We’re Mahmoud and Alan, building Cyberdesk (https://www.cyberdesk.io/), a deterministic computer use agent for automating Windows desktop applications. Developers use us to automate repetitive tasks in legacy software in healthcare, accounting, construction, and more, by executing clicks and keystrokes directly into the desktop.

Here’s a couple demos of Cyberdesk’s computer use agent:

A fast file import automation into a legacy desktop app: https://youtu.be/H_lRzrCCN0E

Working on a monster of a Windows monolith called OpenDental (showcases agent learning process as well): https://youtu.be/nXiJDebOJD0.

Filing a W-2 tax form: https://youtu.be/6VNEzHdc8mc

Many industries are stuck with legacy Windows desktop applications, with staff plagued by repetitive tasks that are incredibly time consuming. Vendors offering automations for these end up writing brittle Robotic Process Automation (RPA) scripts or hiring off-shore teams for manual task execution. RPA often breaks due to inevitable UI changes or unexpected popups like a Windows update or a random in-app notification. Off-shore teams are often unreliable and costlier than software, plus they’re not always an option for regulated industries.

I previously built RPA scripts impacting 20K+ employees at a Fortune 100 company where I experienced first hand RPA’s brittleness and inflexibility. It was obvious to me that this was a bandaid solution to an unsolved problem. Alan was building a computer use agent for his previous startup and realized its huge potential to automate a ton of manual computer tasks across many industries, so we started working on Cyberdesk.

Computer use models can struggle with abstract, long-horizon tasks, but they excel at making context-aware decisions on a screen-by-screen basis, so they’re a good fit for automating these desktop apps.

The key to reliability is crafting prompts that are highly specific and well thought out. Much like with ChatGPT, vague or ambiguous prompts won’t get you the results you want. This is especially true in computer use because the model is processing nearly an entire desktop screen’s worth of extra visual information; without precise instructions, it doesn’t know which details to focus on or how to act.

Unlike RPA, Cyberdesk’s agents don’t blindly replay clicks. They read the screen state before every action and self-correct when flows drift (pop-ups, latency, UI changes). Unlike off-the-shelf computer use AIs, Cyberdesk runs deterministically in production: the agent primarily follows the steps it has learned and only falls back to reasoning when anomalies occur. Cyberdesk learns workflows from natural-language instructions, capturing nuance and handling dynamic tasks - far beyond what a simple screen recording of a few runs can encode.

This approach is good for both reliability and cost: reliability, because we fall back to a computer use model in unexpected situations; and cost because the computer use models are expensive and we only use them when we need to. Otherwise we leverage faster, more affordable visual LLMs for checking the screen state step-by-step during deterministic runs. Our agents are also equipped with tools like failsafes, data extraction, screen evaluation to handle dynamic and sensitive situations.

How it works: you install our open source driver on any Windows machine (https://github.com/cyberdesk-hq/cyberdriver). It communicates with our backend to receive commands (click, type, scroll, screenshot) and sends back data (screenshots, API responses, etc). You give our computer use agent a detailed natural language description of the process for a given task, just like an SOP for an employee learning a new task for the first time. The agent then leverages computer use AI models to learn the steps and memorizes them by saving each screenshot alongside its action (click on these coordinates, type XYZ, wait for page to load, etc).

The agent deterministically runs through these steps to run fast and predictably. In order to account for popups and UI changes, our agent checks the live screen state against the memorized state to determine whether it’s safe to proceed with the memorized step. If no major changes prevent safe execution of the memorized step, it proceeds; otherwise, it falls back to a computer use model with context on past actions and the remaining task.

Customers are currently using us for manual tasks like importing and exporting files from legacy desktop applications, booking appointments for patients on a desktop PMS, and data entry for filling our forms like patient profiles and such in an EMR.

We don't have a self-serve option yet but we'd love to onboard you manually. Book a demo here to learn more! (https://www.cyberdesk.io/) If you’d rather wait for the self-serve option a little later down the line, please do submit your email here (https://forms.gle/HfQLxMXKcv9Eh8Gs8) so you can be notified as soon as that’s ready. You can also check out our docs here: https://docs.cyberdesk.io/.

We’d absolutely love to hear your thoughts on our approach and on desktop automation for legacy industries!

throw03172019 1 week ago | parent

Looks great. For the EMR use cases, do you sign BAAs? Which CUA models are being used? No data retention?

mahmoud-almadi 1 week ago | parent

We sign BAAs with all our healthcare customers + all our vendors. Currently using Claude computer-use. Zero-data retention signed with both Anthropic and OpenAI, so none of the information getting sent to their LLMs ever get retained

hermitcrab 1 week ago | parent

>none of the information getting sent to their LLMs ever get retained

Is it possible to verify that?

sgtwompwomp 1 week ago | parent

Yup! We have signed certificates that explicitly state this, with all LLM providers we use.

herval 1 week ago | parent

I’m guessing OP is asking if it’s possible to verify they’re honoring the contract and deleting the data?

feisty0630 1 week ago | parent

That's not "verification" by any definition of the word.

downrightmike 1 week ago | parent

Is it a 3rd party that is verifying?

mahmoud-almadi 1 week ago | parent

We haven't looked into this kind of approach yet, but definitely worthwhile to do at some point!

bozhark 1 week ago | parent

[flagged]

mahmoud-almadi 1 week ago | parent

Right now we are taking the policies we signed with our LLM vendors as a verification of a zero data retention policy. We did also get their SOC 2 Type II reports and they showed no significant security vulnerabilities that will impact our usage of their service. We're doing our best to deliver value while taking as many security precautions as possible: our own data retention policy, encrypting data at rest and in transit, row-level security, SOC 2 Type I and HIPAA compliance (in observation for Type II), secret managers. We have other measures we plan to take like de-identifying screenshots before sending them up. Would love to get your thoughts on any other security measures you would recommend!

sentientslug 5 days ago | parent

How exactly would you do this? Be realistic

mahmoud-almadi 1 week ago | parent

Good point. In a way we can verify to a customer that we have that policy set up with them by showing them the certificate. But you are correct in that we haven't gone as far as asking for proof from Anthropic or OpenAI on not retaining any of our data but what we did do is we got their SOC 2 Type II reports and they showed no significant security vulnerabilities that will impact our usage of their service. So now we have been operating under the assumption that they are honoring our signed agreement within the context of the SOC 2 Type II report we retrieved, and our customers have been okay with that. But we are definitely open to pursuing that kind of proof at some point.

DaiPlusPlus 1 week ago | parent

Honestly, I'm surprised your lawyers let you post that here.

+1 for honesty and transparency

sethhochberg 1 week ago | parent

Typically with this sort of thing the way it really works is that you, the startup, use a service provider (like OpenAI) who publish their own external audit reports (like a SOC 2 Type 2) and then the SOC 2 auditors will see that the service provider company has a policy related to how it handles customer data for customers covered by Agreement XYZ, and require evidence to prove that the service provider company is following its policies related to not using that data for undeclared purposes or whatever else.

Audit rights are all about who has the most power in a given situation. Just like very few customers are big enough to go to AWS and say "let us audit you", you're not going to get that right with a vendor like Anthropic or OpenAI unless you're certifiably huge, and even then it will come with lots of caveats. Instead, you trust the audit results they publish and implicitly are trusting the auditors they hire.

Whether that is sufficient level of trust is really up to the customer buying the service. There's a reason many companies sell on-prem hosted solutions or even support airgapped deployments, because no level of external trust is quite enough. But for many other companies and industries, some level of trust in a reputable auditor is acceptable.

mahmoud-almadi 1 week ago | parent

Thanks for the breakdown Seth! We did indeed get their SOC 2 Type II reports and made sure they showed no significant security vulnerabilities that will impact our usage of their service.

feisty0630 6 days ago | parent

All of which has nothing to do with OpenAI or Anthropic deciding to use your data??? SOC 2 Type II is completely irrelevant.

You've got two companies that basically built their entire business upon stealing people's content, and they've given you a piece of paper saying "trust me bro".

piltdownman 6 days ago | parent

Welcome to the invalidated EU-US Safe Harbour, the invalidated EU-US Privacy Shield, and the soon-to-be invalidated EU-US Data Privacy Framework (DPF) and Transatlantic Data Privacy Framework (TADPF).

Digital sovereignty and respect for privacy and local laws are the exception in this domain, not the expectation.

As Max Schrems puts it "Instead of stable legal limitations, the EU agreed to executive promises that can be overturned in seconds. Now that the first Trump waves hit this deal, it quickly throws many EU businesses into a legal limbo."

After recently terrifying the EU with the truth in an ill-advised blogpost, Microsoft are now attempting the concept of a 'Sovereign Public Cloud' with a supposedly transparent and indelible access-log service called Data Guardian.

https://blogs.microsoft.com/on-the-issues/2025/04/30/europea...

https://www.lightreading.com/cloud/microsoft-shows-who-reall...

If Nation States can't manage to keep their grubby hands off your data, private US Companies obliged to co-operate with Intelligence Apparatus certainly won't be.

mahmoud-almadi 6 days ago | parent

You make valid points. At the end of the day we're focused on delivering real value while taking every security precaution we can reasonably take and build new technology at the same time. Eventually as we grow we'll be able to do full self hosting for our customers and perhaps even spin up our own LLMs in our own servers. But until then, we can only do so much.

mahmoud-almadi 6 days ago | parent

I appreciate your skepticism. At the end of the day we're focused on delivering real value while taking every security precaution we can reasonably take and build new technology at the same time. Eventually as we grow we'll be able to do full self hosting for our customers and perhaps even spin up our own LLMs in our own servers. But until then, we can only do so much.

bozhark 1 week ago | parent

Nope.

rkagerer 1 week ago | parent

Personally I think this approach is flawed because it runs in the cloud. If it were an agent I could run locally I'd be much more interested.

mahmoud-almadi 1 week ago | parent

Are you referring to the LLM being used or where the actions (click, type, etc) are being executed? The actual actions can be executed on any windows machine, so the actual execution can take place locally on your device. The LLMs we're using right now are cloud LLMs. We haven't done an LLM self hosting option yet. Can I ask what reservations you have about running in the cloud? We have zero-date retention signed with our LLM vendors, so none of the data getting sent to them ever gets retained.

iptq 1 week ago | parent

If this can't run full-local, isn't that basically a botnet? You're talking about installing a kernel-level driver that receives instructions on what to do from a cloud service.

mahmoud-almadi 1 week ago | parent

Great point! Yes you are correct in that the actual "agent" lives in the cloud and its actions are executed by a proxy running on the desktop. Hopefully at some point we can set up a straightforward installation procedure to have the AI models running entirely on the desktop, but that's constrained by desktop specs for now. VMs and desktops with the specs to handle that would be prohibitively expensive for a lot of teams trying to build these automations.

rm_-rf_slash 1 week ago | parent

Out of curiosity, what would the minimum specs need to be in order to run this locally?

My PC is just good enough to run a DeepSeek distill. Is that on par with the requirements for your model?

sgtwompwomp 1 week ago | parent

There isn't a viable computer use model that can be ran locally yet unfortunately. Am extremely excited for the day that happens though. Essentially the key capability that makes a model a computer use model is precise coordinate generation.

So if you come across a local model that can do that well, let us know! We're also keeping a close watch.

rkagerer 1 week ago | parent

What would it take to train your own?

mahmoud-almadi 1 week ago | parent

I don't know too much about training your own computer use model other than it would probably be a very hefty, very expensive task.

However, I believe ByteDance released UI-TARS which is an excellent open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!

ciaranmca 1 week ago | parent

Haven’t looked into them much but I thought the Chinese labs had released some for this kind of thing

mahmoud-almadi 6 days ago | parent

You are correct in that ByteDance did releas UI-TARS which sounds like a really good open source computer use model according to some articles I read. You could run that locally. We haven't tested it so I wouldn't know how it performs, but sounds like it's definitely worth exploring!

rkagerer 1 week ago | parent

I'm talking about the LLM (and any other infrastructure involved). Reasons are:

- Pricing. If I grow to do this at scale, I don't want to be paying per-action, per-month, per-token, etc.

- Privacy. I don't want my data, screenshots, whatever being sent to you or the cloud AI providers.

- Control. I don't want to be vulnerable to you or other third parties going bankrupt, arbitrarily deciding to kill the product or it's dependencies, or restructuring plans/pricing/etc. I also want to be able to keep my day to day operations running even if there's a major cloud outage (that's one reason we're still using this "old fashioned", non-cloud software in the first place).

I think I'm simply not your target market.

I advise several companies who could be (they run "legacy" software with vast teams of human operators whose daily tasks include some portion of work that would be a good candidate for increased automation), but most of them are in a space where one or more of the above factors would be potential deal breakers.

The retention agreements between you and your vendors are great (I mean that sincerely), but I'm not party to them so they don't do anything for me. If you offered a contractual agreement with some teeth in it (eg. underwritten or bond-backed to the tune of several digits, committing to specific security-related measures that are audited, with a tacit acknowledgement any proven breach of contract in and of itself constitutes damages) it could go a long way to address the privacy issues.

In terms of pricing it feels like the core of your product is an outside vendor's computer-operating AI model, and you've written a prompt wrapper and plumbing around it that ferries screenshots and directives back and forth. This could be totally awesome for a small scale customer that wants to dip their toes into AI automation and try it out as a turnkey solution. But the moat doesn't seem very big, and I'd need to be convinced it's a really slick solution in order to favour that route instead of rolling my own wrapper.

Please don't take this the wrong way, it's just one datapoint of feedback and I do wish you luck with your venture.

mahmoud-almadi 1 week ago | parent

These points you're making are excellent!

Self hosting is inevitably a part of our roadmap. Cyberdesk will have a future where we host our entire agentic framework on your own servers. AI models and the whole backend included.

I can totally see myself having the same preferences as you if I were you with regards to cost, privacy, and control.

The unique value in Cyberdesk lies beyond being a wrapper around a computer use AI model. Our intelligence caching is built on large evals that help us produce prompts that are highly reliable for the intelligent caching to work well in the first place. On top of that there are several tools that allow the agent to be useful (import/export files, failsafes, taking actions using data that was read during the same run). Rebuilding Cyberdesk, while possible, will require several weeks at the very least of very rapid iteration. So for a dev team that wants to build the best computer use agent in the world, I guess that's doable. But for a team trying to be the best "X" in their particular industry, it's probably going to be a time sink that will take away from their ability to compete well in their space, hence why Cyberdesk is a great choice for them.

I hope you keep an eye on what we're doing! I really like your insights here and I'm curious to see what you think as we evolve over the next months and years. Maybe when we do full self hosting you'll be a customer :)

MortyWaves 1 week ago | parent

Frankly quite insulting to call any Windows app legacy

mahmoud-almadi 1 week ago | parent

sorry it came off that way! could you elaborate on that thought?

boombapoom 1 week ago | parent

windows itself is legacy.

mattfrommars 1 week ago | parent

Looks great to automate workload for Windows desktop application. I'd love to understand more deeply how your application works, so the set of commands your backend send is click, scroll, screenshot. Does it send command to say type character into an input field? How is it able to pin point a text field from a screenshot? Is LLM reliable to pin point x and y to click on a field?

Also, to have this run in a large scale, Does it become prohibitively expensive to run on daily basis on thousand of custom workflows? I assume this runs on the cloud.

sgtwompwomp 1 week ago | parent

Thanks! And yes, so our pathfinder agents utilize Sonnet 4's precise coordinate generation capabilities. You give it a screenshot, give it a task, and it can output exact coordinates of where to click on an input field, for example.

And yes we've found the computer use models are quite reliable.

Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home). If not, our system can complete an entire task end to end, on the order of less than $0.0001.

So it's a hybrid system at the end of the day. This results in really low costs at scale, as well as speed and reliability improvements (since in the happy path, we run exactly what has worked before).

mattfrommars 3 days ago | parent

Thanks for the reply. I lost you in this part, Great questions on scale: the whole way we designed our engine is that in the happy path, we actually use very little LLMs. The agent runs deterministically, only checking at various critical spots if anomalies occurred (if it does, we fallback to computer use to take it home)

I assume you send screenshot to claude for nest action to take, how are you able to reduce this exact step by working deterministically? What is the is deterministic part and how you figure it out?

sgtwompwomp 2 days ago | parent

So what I meant is this: When you run our Cyberdesk agent the first time, it runs with the computer use agent. But then once that’s complete, we cache every exact step it took to successfully complete that task (every click, type, scroll) and then simply replay that the next time.

But during that replayed action, we do bring in smaller LLMs to just keep in check to see if anything unexpected happened (like a popup). If so, we fall back to computer use to take it home.

Does that make sense? At the end of the day, our agent compiles down to Pyautogui, with smart fallback to the agent if needed.

mattfrommars 2 days ago | parent

Hi,yes. It makes sense now. To cache the steps and reply. Very efficient strategy than to run the step each time using LLM

deepdarkforest 1 week ago | parent

Congrats! I think the space is very interesting, I was a founder of a similar windows CUA infra/ RPA agents but pivoted. My thoughts:

1) The funny thing about determinism is how deterministic you should be when to break, its kind of a recursive problem. agents are inherently very tough to guardrail on an action space so big like in CUA. The guys from browser use realized it as well and built workflow-use. Or you could try RL or finetuning per task but is not viable(economically or tech wise) currently.

2) As you know, It's a very client facing/customized solution space You might find this interesting, it reflects my thoughts in the space as well. Tough to scale as a fresh startup unless you really niche down on some specific workflows. https://x.com/erikdunteman/status/1923140514549043413 (he is also building in the deterministic agent space now funnily enough) 3) It actually is annoyingly expensive with Claude if you break caching, which you have to at some point if you feed in every screenshot etc. You mentioned you use multiple models (i guess uitars/omniparser?), but in the comments you said claude?

4) Ultimately the big bet in the RPA space, as again you know, is that the TAM wont shrink a lot due to more and more SAP's, ERP's etc implementing API's. Of course the big money will always be in ancient apps that wont, but then again in that space, uipath and the others have a chokehold. (and their agentic tech is actually surprisingly good when i had a look 3 months ago)

Good luck in any case! I feel like its one of those spaces that we are definitely still a touch too early, but its such a big market that there is plenty of space for a lot of people.

mahmoud-almadi 1 week ago | parent

Thanks! Really appreciate the awesome thoughts.

1) You're totally right about this problem! We handle this issue with intelligent caching and heavy prompt/context engineering. These measures have been controlling agent behavior pretty well.

2) The key to scaling is building a tool that developers can learn and pick up themselves and that's what we're seeing here. By understanding how to use the tools we built to control agent behavior, developers have been able to leverage our docs to achieve desirable behaviors from our computer use agents when using Cyberdesk.

3) Surely you're correct about cost here as well, but with well defined workflows this will only happen a minority of the times the agent runs.

4) Great point! The beauty of what computer use made possible is it can solve problems previously unsolved by RPA altogether. The TAM will increase significantly when computer use agents start working really well. We've already seen this with our customers: they're able to build automations with Cyberdesk that they weren't able to using RPA. So while the TAM might bleed some ERPs and legacy apps that will implement APIs, I think it's going to grow at a much faster rate than it will shrink.

mwcampbell 1 week ago | parent

Have you looked at using accessibility APIs, such as UI Automation on Windows, to augment screenshots and simulated mouse clicks?

throw03172019 1 week ago | parent

Isn’t this an optional feature for developers? They can disable it / remove the names of the buttons, etc to make RPA harder?

mahmoud-almadi 6 days ago | parent

Not yet! Vision only has been doing pretty well so far, but we're definitely looking into that at some point. Thanks for the suggestion!

gerdesj 1 week ago | parent

Autoit must be a good 20 years old: https://www.autoitscript.com/site/

sgtwompwomp 1 week ago | parent

Unfortunately these scripting tools just are untenable when dealing with so many desktop flows that all have changing UIs and random popups. You end up having to repair all of them all the time, in fact there's a whole consulting industry out there just to do this all day.

The whole idea of Cyberdesk is the prompt is the source of truth, and then once you learn a task once via CUA, the system follows that cache most of the time until you have to fall back to CUA, which follows the prompt. And that anomaly is also cached too.

So over time, the system just learns, and gets cheaper and faster.

gerdesj 1 week ago | parent

I used AutoIT to remove old AV from roughly 6000 PCs across 20 odd countries back in 2002. I still use it from Zenworks on some customer sites, 20+ years later.

Old school Windows apps are not "flowing" they generally use a toolkit and AutoIT is able to use the Windows APIs to note window handles, or the text in various widgets and so on and act on them.

These are not complicated beasts - they are largely determinant. If you have to go off piste and deal with moguls, you have a mode called "adlib" where you deal with unusual cases.

I find it a bit unpleasant that you describe part of my job as "untenable". I'm sure you didn't mean it as such. I'm still just as cheap as I was 20 years ago and probably a bit quicker now too!

MetaWhirledPeas 1 week ago | parent

Can it do assertions? This could be useful for testing old software.

sgtwompwomp 1 week ago | parent

Yup, a few of our clients have a need to verify something in the software, so we support an agentic step where we look at the screen and can verify whether something exists, or whatever a step was completed, etc!

kjellsbells 6 days ago | parent

This is great, but a part of me wonders if our industry isnt putting a bandaid on a problem that we ourselves created.

Consider your typical early-2000s era Windows app. It would expect a mouse, but for power users, keyboard shortcuts would be available for every action, even if clunky. For example, Alt F tab tab tab to get to some input field, enter text, tab Alt R Return.

By about 2015 these were all straightforwardly scriptable with AutoHotkey amd similar tools.

But too late: by 2015 even Windows users were using web apps, where the keyboard bindings are variable or non existent, where the entire UI can change overnight, etc. I see some RPA approaches desperately trying to decode the DOM or match pixel elements. It's wild, as you point out.

I guess what I'm wondering if going after legacy Windows apps is a small TAM already largely solved, whereas the SPA/webapp market is gigantic, growing every day, and woefully, miserably, broken as far as automation is concerned.

mahmoud-almadi 6 days ago | parent

Great point! The issue with hotkeys is a series of hardcoded keyboard shortcuts is likely to fail due to unexpected popups which are quite frequent. Cyberdesk can automate entire workflows with keyboard shortcuts only, and we typically push for that, but where it really shines is its ability to handle anomalous situations and ensure the workflow works every time. As for webapps, Cyberdesk is equipped to automate them just as much as it can automate desktop apps, and it's been performing just as well on web apps as it is on desktop apps. To our agent a webapp and a desktop app are the same thing. just a bunch of shapes and text on the screen. We just emphasize the legacy apps in our messaging because there's a strong need for that. Almost every customer we've talking to has tried RPA but to no avail

rolstenhouse 6 days ago | parent

Congrats on the launch! I've tried building a windows automation script myself but found existing solutions quite cumbersome booking appointments and fetching information on a desktop SMS.

Can I monitor/manage this remotely? I'm not on site with the client and previously tried to manage through AnyDesk but the client often turned off the machine.

Also is there anyway to run this so that it won't interrupt workflows while someone is using the machine? I imagine a solution could just be having the client run an extra computer that's dedicated for this or running after hours on the local machine.

Scheduled a demo

mahmoud-almadi 6 days ago | parent

Thanks for scheduling a demo! Looking forward to chatting soon.

You can definitely monitor this remotely by looking at the logs of the runs which includes screenshots, but we don't currently support live streaming of the desktop screen. Our customers usually do that via RDP which is usually straightforward.

Because our agent directly controls the keyboard, mouse, and display, it needs exclusive access (no concurrent users). Our customer typically run it on either a dedicated spare desktop at the client site or a dedicated cloud VM (managed by them or provisioned by us)

tomgs 6 days ago | parent

Great idea!

You have not social share preview image on the homepage: https://www.opengraph.xyz/url/https%3A%2F%2Fwww.cyberdesk.io...

zkxjzmswkwl 6 days ago | parent

You do not need a driver to do what you're doing. It's just slightly easier with a driver.

You can accomplish this from usermode and you wouldn't give potential customers (anyone who plays modern games) a non-starter for your product.

throw03172019 5 days ago | parent

How do you determine what action is to use the “cached” agent to reduce cost/increase speed?

sgtwompwomp 2 days ago | parent

So once you run a workflow (prompt) once, then that trajectory is cached.

Run the workflow again and it’ll run through that cached trajectory as best as it can, falling back to computer use if needed.

iamcreasy 4 days ago | parent

Congratulations on the launch!

How does it know when to stop and ask a human to intervene?

sgtwompwomp 2 days ago | parent

Our agent would have a tool to essentially bring in the human. Not built this yet, but the closest thing we do have is that our agent can declare a task as failed if it determines it can’t proceed (based on your instructions).

More on this soon! How would you imagine this would be useful?