76 points yehiaabdelm 14 hours ago 94 comments
I use Claude Code and Codex, but I haven't been able to enter flow state like I can when I hand write code.
This is kind of ironic to me since AI should be a bicycle for the mind, but right now it feels like a bicycle that just brakes abruptly every couple minutes. I stop, wait, review, prompt again.
Is there anyone exploring something fundamentally different than the prompt response loop we have today?
I actually think the idea of a tab model is directionally better than prompt response.
Would love to hear about any startups, personal experiments, etc.
Jimmc414 14 hours ago | parent
also let the model verify itself. don't give it an objective that is vague, give it clear exit criterias for goals and let it loop until it gets there so much of the orchestration scaffolding seems like massive technical debt
oddly, I do the opposite of a lot of conventional advice when it comes to models. I use no memory, I think there is something similar to context rot when everything is stored. I like creating markdown files as memory that the model can grep if needed. I also havent found a real use for hooks yet, I have tried but they always seem to get in the way. skills on the other hand are very undervalued. they are so much more powerful than many realize. I used to think agents were where the power was. I think its actually skills. agents are really for context preservation. skills are what increase capabilities
I'm not even talking about quantity of items in memory, I mean dilution of intent. I really love a model with a clean slate and only the items it needs. I fear the memory guides the model in areas that might not be what I want with the current prompt
progressive disclosure is a big one. you can make context available but it is only loaded when needed. like lazy loading for prompt engineering. skills are to be used to instruct the model how to do something specific that is not in its training data. like how to access my proprietary system, how to interface with a custom program. you can embed templates in skills, you can embed code that executes in skills and only the output is loaded into context. skills expand capabilities, agents constrain context
(constraining context is a very good thing btw, don't mean to infer that agents are somehow inferior to skills)
cedws 14 hours ago | parent
zmgsabst 14 hours ago | parent
So far that’s been much nicer for anything large or complex, because I was spending all my time on context piping.
hsn915 13 hours ago | parent
This is basically like queueing up prompt.
I wish Claude Code had a thing like that builtin. Like a "user ideas scratchpad".
Bossie 13 hours ago | parent
My current flow heavily relies on Matt Pocock's Skills and Sandcastle project. I find them highly valuable in practice: grilling(/wayfind) into a spec and extract issues. Those live in Linear projects. I'm pointing my Sandcastle set-up at such Linear projects (or loose issues), which results in an MR.
Currently at the point of self-improving the prompts and Sandcastle set-up with a retrospective pass of the logs.
AJFlan 13 hours ago | parent
kybernetikos 13 hours ago | parent
Bogdanp 13 hours ago | parent
avilay 13 hours ago | parent
To answer your question - I discuss the approach with Claude Code (e.g., should I implement my own ACT model in JAX or PyTorch, Python or Rust or Julia, etc.). Then write the initial part of the code myself. Opening up a blank vscode is a simple joy of life I refuse to give up :-) I'll ask Claude for advice if I get stuck, it will helpfully offer to write that code for me, I obstinately decline. Eventually, I'll get bored of some minutiae or other, at which point I'll ask Claude to complete just that part of it.
thinkingemote 13 hours ago | parent
Sometimes using a LLM can assist these and sometimes it can feel like cheating myself out of a good thing and I'm not entirely sure where the borders are. It could also be related to a sense of ownership or pride in ones work and seeing the value in doing quality work.
tolg 13 hours ago | parent
avilay 12 hours ago | parent
* https://youtu.be/VbUFMYs0kXQ?si=xiNw4ZFlla8k-p7w The person who gives this talk (Rian Doris) has a good newsletter that I still read. I just checked their website and it has gone in full commercial mode, so YMMV.
* https://www.ted.com/talks/elizabeth_gilbert_your_elusive_creative_genius
* https://www.amazon.com/dp/0465074871
* https://www.betterup.com/blog/meaning-of-personal-valuesScroll_Swe 1 hour ago | parent
I'm not a programmer, but I very much enter a flow state working on tickets, or playing a video game on higher difficulties when everything "clicks"
doubled112 59 minutes ago | parent
Between having kids and a work situation a few years back, it is like my brain expects to be interrupted at any moment, so won't get there.
thinkingemote 13 hours ago | parent
"Software engineering at the tipping point" https://www.youtube.com/watch?v=2n41YjR5QfU
chilmers 13 hours ago | parent
sixothree 13 hours ago | parent
fabioz 13 hours ago | parent
So, I actually decided to try to tackle it myself and worked some months (full time) on it.
https://beolis.com is the result of that, it's a local cli in a kanban board style with a remote server to keep the team on track (I've been using it myself for some time and actually started to ask some friends to use it just yesterday -- feedback very welcome, I still wanted to do some additional things before asking more people to use it, but oh well, I'm a fan of building in public anyways and it's probably better to have feedback sooner rather than later).
The main point there is that you work mostly in the ticket description (your own spec) and the plan (the spec as the agent sees it, generated with a custom workflow) and then having another custom workflow to implement it (you can choose how you want it -- https://beolis.com/blog/post/custom-coding-workflows has some info on what I'm using myself).
As a result, at least for me, I do spend more time immersed in a flow state (although I'm in that state writing the specs and reviewing code -- although in some cases it's more work to write the spec in a way the agent can work when things get more complicated vs just diving into the code, so, going into "code" mode is something I still have to do, agents are definitely not perfect).
I guess I'm lacking in docs on how to effectively use it. I have plans to create a video next week and post it in the blog, so, if you're interested, keep track of it ;)
integrii 13 hours ago | parent
You get some amazing results with teams of AIs if you do it right. The key is to control behavior with what integrations and responsibilities each agent has. That way they naturally adapt, delegate, fact check each other, and generally act more autonomously.
This is already running the automated news site ainews.personastack.ai complete with social media posts 100% automated.
It also runs the issue triage, coding, reviews, and releases for the Kuberhealthy open source CNCF project, which is another thing of mine.
I don't think the next step is really smarter models. It's how we make the models more effective, and teams, when done right, net the best results I've seen.
Hoping to get noticed here soon, but it's extremely hard to do solo I'm finding.
dnikolovv 13 hours ago | parent
sigmoid10 13 hours ago | parent
fraXis 13 hours ago | parent
Can you elaborate more about its development? How much do 110B tokens equate to in $$$? What LLM did you prefer most during development? Any suggestions for other solo developers trying to launch their LLM-built product?
chrisjj 13 hours ago | parent
Fixing that for you.
I haven't been able to enter flow state like I can when I write code.
brador 13 hours ago | parent
Why should AI be limited to human time. Is a mountain? A galaxy?
philbo 13 hours ago | parent
It's still very wip, I spent a couple of weekends on it so far, but I'm working on a harness that eschews autonomy and instead aims to work as a pair programming partner. Key to that are distinct "driver" and "navigator" modes, with the capacity to flip between them rapidly.
https://gitlab.com/philbooth/opair
(not really usable yet, but after tomorrow's session I expect to be developing opair in opair, which is mildly exciting)
jesse_ash 12 hours ago | parent
bob1029 13 hours ago | parent
I think there are 10-100x productivity gains lurking in here. It is very expensive for a human to reserialize their mental state into a prompt each time a task needs working on. An agent can do this ~instantly and with high frequency 24/7. The higher the rate of evaluation the less change has to be dealt with between any two iterations. So, the likelihood that a given iteration needs human help goes down as you increase the rate of evaluation per unit of wall clock time. Tighter and faster control loops tend to require less severe corrective measures than slow and sloppy ones.
This is the most plausible reason for so many tokens in the future. I can actually see a million tokens per second making sense. I have a pretty good idea how I'd approach this if I actually had access to this kind of infrastructure. 1Mtok/s is baby tier in terms of raw information theory. The politics of employing a system like this are far more terrifying to me than any technological aspects. Humans really like having control over things, even when that control is pure downside for the business.
lexoj 13 hours ago | parent
just-tom 13 hours ago | parent
I think my next step is to perform the grilling session inside the front end, currently I perform it in my terminal and then paste in the front end.
LikelyLiar 12 hours ago | parent
just-tom 12 hours ago | parent
The challenge I dealt with is actually running all services on the same machine - 4 services, each needs their own port, 2 of them need docker. Then injecting their sandbox urls into env vars for communication. All to have a fully working app with all services running - I just go to the public web app url and test.
Nonetheless, I'll look into CC Web, thanks for the mention.
magicmadrid00 13 hours ago | parent
- Do your thinking alone. (AI part: search, understanding)
- Specing. (AI part: search, understanding, completing some text)
- Coding like the old days. (AI part: search, understanding, code examples)
- Okay, now I have a good idea of how my feature is going to work
- Look for fluff code and delegate it to AI to write/review it.
- Focus on the part of the code I want to have fun doing.
- Review.
- Repeat.
It’s slower than the approach of doing specs and letting AI do the rest, while focusing your role only on code review. However, I’m more in control of what I build, I can explain what I built better than everyone else, and I build up my knowledge. (also I have less problems, because less code haha)
Will I go for the full Agentic way ? Maybe but I will find a way to slow it down so I can be in control
jwardbond 40 minutes ago | parent
I felt that, by using the "full agentic way" I am implicitly accepting the fact that all the knowledge I have right now is all the knowledge I will ever need or want to have (with the exception of new knowledge on how to ask AI to do things, I guess).
This seems like a nice way to enable yourself with AI, but not replace your brain completely.
danpalmer 13 hours ago | parent
Rather than ask them to write web-apps in webby languages with open source frameworks etc, providing a very fixed, on-rails development process where everything is abstracted away. Accept that it'll be less powerful, but take the trade-off that it'll hopefully be faster and produce much more controllable software.
Concrete example, why do we let the LLM choose a database, schema, migration procedure, library, etc. We could decide to only support one database, enforce schema design (such as every table containing access control), enforce a migration process, enforce a library, even do schema design in a fixed config file rather than arbitrary DDL. Same for auth, deployments, even UI.
villaaston1 12 hours ago | parent
Though some frontend decisions are a bit more open
danpalmer 12 hours ago | parent
I'm thinking of going far further, to the point that perhaps we should use a new language designed only for web-app development. I'm thinking about removing almost all options so that the LLM only gets to write custom business logic and data modelling and doesn't need to do much else. Again this is all at the cost of being more generally applicable, but I see a lot of software that is fundamentally CRUDL and it's still hard to build well, and I also see a lot of LLMs reinventing the wheel but implementing too many sides on that wheel. They need guardrails.
yehiaabdelm 13 hours ago | parent
I'm trying to do the same amount of work faster, not do work in parallel or agent orchestration. I'm not against letting the model go off and do things on it's own, that has its time and place.
But if I can do something in 15 minutes instead of 1 hour without the annoying prompt response loop, without the feeling that there could be blind spots, and while keeping all of the context (or at least most) in my head. That's a bigger win than spinning up 5 agents to do different things.
neepoPhantom 13 hours ago | parent
Incipient 13 hours ago | parent
3 tier, philosophy-spec-design. Increasing detail. Design files include db model explanations and pseudocode/function headers - that level of detail.
For each thing I need to change, I have a, prompt ready to go to ask the agent to follow about 5 steps and it outputs a 'reviewfile' with details of what it things about the thing I posited. I review its output. I have another prompt ready to then get an agent to generate a taskfile + update the design documentation. The taskfile explains in great detail what has changed and what needs to be implemented. I review the taskfile and got diffs of the design doc changes. Finally an agent implements the taskfile. I review all changed code and commit.
It gets there, but still definitely misses some stuff. It's very adequate for a MVP I'm finding.
Edit: this seems to only work with Opus. Sonnet can't do it (maybe I'm just lucky and Opus is seriously compensating for an awful approach and I'm just lucky?)
pjbeam 13 hours ago | parent
Including altering the turn concept. I think it is still ultimately call and response but instead of everything is a quarter note you can get a little closer to a beat you like.
mbork_pl 12 hours ago | parent
aitor1717 12 hours ago | parent
The only thing that I consistently do is create a simple html dashboard with a to-do list I can guide claude code with while rendering progress somewhat graphically. I love the levers but it's kinda the opposite of the flow in question.
doganarif 12 hours ago | parent
snissn 12 hours ago | parent
B) opinionated skills that use GitHub tickets, merge gates and execution of ticket graphs
dosisking 12 hours ago | parent
LLM AI is like Uber for the mind.
SwellJoe 12 hours ago | parent
Some stuff I've built:
https://github.com/swelljoe/tandem - Tandem is a sysadmin buddy that travels with you over ssh. Just a wrapper over tmux and claude code (or whatever agent you like), it opens two panes in tmux, one with an ssh session to one of the hundreds of devices I maintain, and one with a local Claude Code configured to use a local work space and instructed via CLAUDE.md/AGENTS.md to use tmux to interact with the remote machine. I built it because a lot of my coworkers were installing Claude Code on our robots and authenticating there to get help with robot troubles, and that felt bad. This allows them to keep all sensitive stuff locally and still get help troubleshooting directly on the device. I happen to find it useful, sometimes, too.
https://github.com/swelljoe/nelson - Nelson is a fancy Ralph loop for security bug hunting that I built to help audit my own software. It's also grown to include a benchmark suite I'm using to figure out which models are worth using for security work. I've published some of those benchmark results, and have a few hundred hours/dollars worth of new ones to publish this weekend. Turns out the benchmarking is more interesting, so that's gotten more attention than the bug-hunting side, but the benchmarks inform how the bug-hunting side works, and I added multi-model/multi-pass scans and de-dupe features recently because I found that letting models have a couple bites at the apple increases discovery, and there are bugs that only some models catch, and it's not always the top model that finds them. There's some overlap, but also some divergence. This research has also led me to start working on a harness for security auditing tasks; giving the agent tools and project structure data to lift detection and reduce false positives.
https://github.com/swelljoe/flar - FLAR is the Fast Light Agent Restrictor. It bubblewraps an agent so it is quite safe to use agents on your local machine, even with `--dangerously-skip-permissions` (which makes agents more fun to use). The sandbox feature found in most agents is porous and can be expanded by the agent harness itself. Similarly, if the agent introduces a supply chain attack into your code and runs it before you get a chance to audit/review it in a PR or run it through an SBOM dependency checker, the blast radius is exactly the project directory and the credentials/history of the one agent. (Whereas, without flar, the blast radius is your whole .ssh, github creds, all agent creds, your keyring, whatever secrets are in your home, etc.) This one is new. Just made it because I was talking about how I always put agents in VMs because I don't trust them. Someone suggested `srt` (https://github.com/anthropic-experimental/sandbox-runtime) and I like the idea but I don't like how complicated and huge and JavaScript it is. You can read and understand the entirety of `flar` in one sitting. Anyway, to break out of "prompt/response", you have to skip permissions, or call it via `claude -p` or API with tasks to perform. Nelson does the latter and `flar` does the former.
That's not to mention all the side projects and other stuff I've been able to make a lot of progress on.
The biggest one is finishing https://venturous.app/ (or, at least I made it do what I most wanted it to do, which is provide map overlays of US public lands and mobile data provider coverage so I can find cool places to camp free while staying connected). This is a re-implementation of an old defunct app called FreeRoam that I absolutely loved when I traveled full-time. I built half of it over several months by hand, and then Claude helped finish it in a few weekends and holidays. I'll get Claude to help build the mobile apps someday.
karlkloss 12 hours ago | parent
Then I got tired, and told it to use PlayWright to control the browser and test by itself. After some hangs, that I had to stop manually, it did all by itself, and finally fixed the bug. I had to increase the agents' steps setting in the config, but that was it. While it was fixing the bug, I surfed the web, and kept an eye on it, but it did everything on it's own. impressive.
danmaz74 12 hours ago | parent
Summarizing it a lot, what it does is:
* help you make better plans
* split plans into iterations, in a module-aware way for projects which have strict modularity (for now I'm doing this specifically with TypeScript and dependency cruiser) - this helps a lot when a project becomes complex
* ask an agent to implement an iteration, and then programmatically run a lot of checks after each iteration - not just regression tests, but also checks against project principles and conventions
* when possible, automatically fix deviations; when not possible, raise them to myself for an end-of-plan review
In this way, instead of having to constantly be engaged with the chat interface, with all the shorter or longer wait times which break my flow, I spend a lot of highly focused time during initial planning and final review. A plan implementation can go on for hours, and the various anchoring mechanisms added to the tool keep drift to a minimum.
At some point I'm planning to release this tool as open source. As this is the result of months of trial and errors, dogfooding, and vibecoding on the tool itself, the codebase is chaotic and the UI is still full of experiments I mostly basically abandoned, and I'm not used to releasing stuff in this status. But perhaps, in this brave new world, I should just do it and see what happens?
cuttothechase 1 hour ago | parent
danmaz74 4 minutes ago | parent
But seriously, if you care, this is just like using an existing library to do the lower level development work which I think is already pretty well done by the existing agents. It's not a design decision.
kosolam 48 minutes ago | parent
danmaz74 8 minutes ago | parent
pigpop 16 minutes ago | parent
danmaz74 2 minutes ago | parent
By the way, if you're not doing that, something that can really help when doing UI/UX work is to have the agent create some mockups, and then tests based on those - I'm using Cucumber with some extra sauce for this. It's a very nice way to guide the agent in a falsifiable way.
captainbland 12 hours ago | parent
I think a lot of people get a sort of novelty effect when first interacting with an LLM which can feel superficially like flow, but it's different in that it eventually wanes and what really happens in practice is you're encouraged to disengage and this makes it almost impossible to get into a true flow state.
The risk here I think is that if you get humans disengaging from the task at hand, there's a higher chance of bugs being introduced. You might move slightly faster in the short term but be forced to hit the brakes in the medium/long term.
bryanhogan 12 hours ago | parent
What I've found useful is to create a tasks.md file where each bullet point / task is one implementation. Bullet points that belong together and can be done in the same chat session are grouped together.
I easily enter a flow state during writing these detailed implementation plans. Then I can also start multiple chat sessions for parts that don't interfere with each other, while I'm waiting for an LLM answer for one part I can get started on the next or start reviewing one of the previous answers.
I have also explored more complex, e.g. using Kanban board for tasks, but I found great value in these simple yet effective setups.
kordlessagain 12 hours ago | parent
There's cmux in this space, but I had already used Hyper for years, so I decided it was time to fork something and build on it. Cmux does tabs in panes AND panes in tabs. Hyperia does addressable panes in tabs and windows. I've tried to keep it minimalistic, which helps with flowing back and forth between different projects (I typically work on 3-4 at a time). I added a Rust sidecar, making all objects addressable over MCP, so Claude Code, Codex, or a small local model on Ollama can split panes, run commands, and read screens, with one hard rule enforced in the harness rather than the prompt: an agent can never move my focus, other than asking for permissions to access a new object. ACLs too. Hyperia also carries an agent loop that wires into it's own MCP server, so a local model in a Javascript "shell" can control resizing the terminal (handy for videos), or opening a project and setting up the agent panes.
I stay typing in a pane while the agents work in theirs, in my peripheral vision, and web panes sit right next to terminals so docs, webapps/sites and the agent chat live in the same window. Reviewing becomes glancing instead of context switching, which is the closest thing to ideal flow with prompts I've gotten out of this auto-AI stuff. Tab and pane clicks copy the address into the buffer, then I paste and issue commands referencing what I want dealt with. I have an SDR radio on my box that allows me to talk to a given pane (WIP not in the build yet). Working on getting the local agent stuff done and wired to the radio.
The upshot of this approach is enabling agents running in one tab, all mounting the same directory, with one in charge of the others. Claude Code is great at this, and it saves on the tokens it would normally use for doing it itself. I talk to Claude, or whatever I pick, and it talks to the rest of the agents and coordinates the work. I like Antigravity a lot because it moves crazy fast for coding. With Claude in control and GLM-5.2 doing auditing and explaining to me how development is going. As an example. No unseeable agent army here. No need for it, actually.
About the only thing that trips me up at the moment is having to work on Hyperia itself, which I don't do inside of it because of restarts. When I work on Hyperia, I start an agent in Windows terminal and wire it into the MCP for testing. I build installers constantly as well, and then run through the Q&A process by using it to work on other projects I'm doing.
I use Zed for code editing and viewing, but rarely. I also just open things in special sticky notes (or have the agent do it) so I understand how we're doing things. GLM-5.2 took to the planning stickys like a fish in water.
https://github.com/deepbluedynamics/hyperia
https://github.com/deepbluedynamics/nemesis8 (n8)
Both are open source, obviously. It's worth mentioning they will remain that way and will never require a service plan or any other cost. I built them because I needed it for another project I will be selling, not aimed at developers at all.
n8 implements the agent runs in containers. This is a separation of concerns - in runs in any terminal and controls the session starts and search for previous sessions (as well as monitoring the usage of tokens, CPU, network and file access). Working on the dashboard for that now, so I can easily see which files are changing, how much they changed, and what changed in them. I co-founded Loggly, so that crap is in my wheelhouse.
This isn't the tab completion model. It works great for the way my brain works, but I also think having an agentic terminal is a good move for anyone writing code and we'd all be better off if we ran agents in containers over our bare metal. It makes it way easier to see what the agent is doing (and resuming later), and allows it to do most of its work in the container, as opposed to running loose on my box..
notahan 11 hours ago | parent
I have tried out some of the popular tools and I'm using opencode on desktop and I use pi via termux on android for when I'm on the go. I think the current direction of PRD -> review -> execute -> debug is in many cases the right mindset.
Working with a team of fresh graduates, I see that working with any vibe coding tool is like being a manager, not a developer. I think that's what you miss, you miss being a developer but the vibe coding tools make you a manager which isn't something that you might enjoy.
Nonetheless, I do think that there are some interesting things to do with pi. I'm just getting started, if anyone has an interesting workflow in pi, I would be interested in trying it out!
magicmadrid00 11 hours ago | parent
What still puzzles me a lot is how you can accept that AI just writes the code for you, without you being the one making decisions about how the code — not the spec — will be written. How are you able to get things done while still keeping a good understanding of everything you did?
Maybe I’m wrong, or maybe it’s because I haven’t pushed the agentic way to its limits, but I really haven’t found it to be a good way to produce good work in general.
alexfortin 10 hours ago | parent
Currently I'm refining what I think works best for me, which I'd call something like "issues/PR based LLM workflow", powered mainly by this action I'm building on top of the Pi coding agent SDK: https://github.com/shaftoe/pi-coding-agent-action
Essentially I issue prompts swapping between the terminal and the git forge web app (GitHub and my own Forgejo instance) and it currently looks something like this:
- create an issue with detail/quality of spec based on how the task or the project is important
- trigger a Pi session prepending a comment in the forge with "/pi " to work on it, either to produce a report or to e.g. implement the change in a new PR
- trigger more sessions in the same thread, be it an issue or a PR, to steer or to add more requests like fork out a new PR or similar. This works also for reviews so I just add comments and the submit a review with "/pi follow the comments instructions" or similar
- if I want more fine graded control and I am at the workstation I use the bridging Pi extension to pick up the work locally: https://github.com/shaftoe/pi-coding-agent-action/tree/devel...
- rinse and repeat until I'm either happy with the change or the PR is so bloated that I get rid of it and start anew
I know it's probably something Claude / Codex / Cursor offer with their web app but I want the freedom and the flexibility to use the LL provider/model I want, and Pi as a harness does that plus all the rests egregiously. Another advantage is that I can fit the LLM action in any pipeline I want and take care of chores like automated changelog generation and what not.
As I said it's still mostly work in progress but in general I think there's lot of potential with this kind of workflow, it forces me to keep the scope of the changes small (I still want to review the PR content after all) and gives me a memory for free just leveraging the ticketing system. I also like the fact the harness is running most of the time in the ci/cd sandbox which, in the case of Forgejo, I control fully.
PS I try to keep my work with/on AI tools on my website at https://a.l3x.in/ai
pramodbiligiri 9 hours ago | parent
I built one such tool for myself: https://www.shipsmooth.net. You can use it to spec/plan out a piece of work, and then easily keep updating the spec/plan as you churn through its implementation. The tool assumes that you will pretty much end up changing the spec/plan during implementation, based on how it's going. In general, I don't see how it's possible to one-shot high quality code for custom use cases.
[1] Going by the definition of flow state here: https://en.wikipedia.org/wiki/Flow_(psychology): "fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity. In essence, flow is characterized by the complete absorption in what one does, and a resulting transformation in one's sense of time."
madprops 9 hours ago | parent
gaigalas 9 hours ago | parent
There's lots of those still. Portable shell programming is my favorite. Even the most capable models limp at it, but I thrive on my own, so it becomes an interaction where I really feel I need to think.
2. Work on dense programs, and use LLM for debugging only. LLMs suck at writing dense code. They thrive at redundancy and verbosity, so it will make you avoid it and use it for adjacent work, not the main thing.
3. Multitask. Ride several bikes at once, but not for the sake of doing more (for that you could automate), do it for the multitasking. Parallelize, split projects into multiple work fronts, work on reducing the time to mental switch between contexts. It's not coding per se, but a great skill, AI involved or not.
colinmarc 8 hours ago | parent
If you're unfamiliar, it's like tab-completion, but it has a context that includes the edits you've made in the last few seconds, and it can predict around the cursor.
The model isn't advanced enough to understand complex tasks, but it has more the feel of the "crafting gun" in Subnautica or other survival crafting games, if that analogy makes any sense.
Personally I hate working with a chatbot - it's low-bandwidth and rage-inducing. If I could imagine a perfect workflow, it would be something like me whispering my train of thought as I program, and then pointing a very fancy "autocomplete gun" at the code.
eddd-ddde 8 hours ago | parent
You can actually go super fast with the right setup and focusing only on the important details like ensuring the shape of the APIs make sense and that test quality is good.
othmanosx 7 hours ago | parent
1. Start a session.
2. Grill my requirements.
3. Write an ADR, then either start implementing or separate into pieces.
4. Review the code on pyor.review, compared to Github, Pyor allows me to categorize the files and changes then review the important stuff and skim the noise it identifies.
5. Since I can do local reviews with Pyor, I can do that with Claude and feed back my comments to be addressed without it going to Github first.
6. Create a PR then merge it.
jarodrh 4 hours ago | parent
On your actual question though, I think the loop you're describing does break the flow and gets very frustrating, but it's been a long time since I've experienced this.
Three things happened to me in the past few months: I've become cost conscious, I wanted to get more done faster, and I wanted to be able to do a lot more at the same time (in parallel). With that I developed my own workflow that works well for me. It's a config-led setup routed by tiers: cheap fast models for mechanical work (lookups, log reads), mid models for implementing against written specs, strong models for judgement and review. It's config on top of a standard harness, nothing exotic.
For me...my flow state has moved from tackling code line by line to traversing the layers of the entire system design in my mind, and being able to clearly articulate this to a strong model.
OJFord 1 hour ago | parent
reactordev 1 hour ago | parent
tombot 1 hour ago | parent
codybontecou 1 hour ago | parent
_boffin_ 1 hour ago | parent
yogthos 1 hour ago | parent
The LLM would explain what it's doing, then write a bit of code, then you have time to look at it and understand it, and go to the next step. At any point you can interject and discuss or change it.
I find the biggest problem is that once an LLM generates a bunch of code, it's really hard for a human to build up the context for what the code is doing and why. When you're coding normally or pairing, then you're gradually absorbing the context and what the code is doing throughout the process.
The reality is that writing code fast was never the bottleneck. It's understanding the code and making sure it's actually doing what's needed that's hard.
saidnooneever 1 hour ago | parent
its an interesting excersize, for me i started with a simple repl to call models through model adapters, then allow them to list directories and read files within a chroot, build up slowly to also write access to files, then look at whats out there and try to build stuff you like from it.
the prompts are hard and there are some weird issues u will hit that will also help u understand certain fundamental limits etc. - understanding those can help also understand why some things dont work as hoped just yet.
for example, i had a real headache trying to make interactive specialized identities within workflows, so each stage is handled by specialized identites which have specific tools and focused context etc. theres a lot of hallucination too so u gotta have a lot more model cals, maybe do consensus between models etc. adversarial identities to review outputs before applying etc. All the stuff you still end up doing yourself again despite having programmed / prompted it all in...
initially it was all one context and identities struggled to remember what part of the process they would do, what tools they had vs what tool outputs to expect from previous stages etc. (it was funny but a big mess)
i use codex now, its closest to what i want, i couldnt get it better myself. claude wants to do too much and 'complete' stuff to much for me..
there are people blogging about loop programming, i did not investigate it thoroughly yet but id expect for myself id have similar results as my previous endevour.
edit: wanted to add, my motivation as claude dumps a lot of text back, i was using it back then. i wanted to give my models part of the screen as 'surface' to pin images, charts, and text etc on there, this worked nicely but i could not get them to do it really organically (prompting issues).
i thought i would be cool if the model could be like hey human, this thing we keep on screen while we discuss / design, like an architecture diagram. went to vulkan / glfw3 and rendering a terminal in there to get good enough pixel accurate graphics for presentation, that worked well and claude built it really easily.
felix-the-cat 51 minutes ago | parent
Then I thought it would be fun to be able to monitor the status of all my workflows as buttons on my Stream Deck XL, and Claude was able to build the plugin with almost no issues at all. It's hilarious how much fun it is.
aleqs 51 minutes ago | parent
The main bottleneck at this point is the cost of all of the tokens in the fairly large test matrix of tasks, harnesses, models.
I hope to release/open source all of this stuff eventually.
worik 9 minutes ago | parent
Are you using Chinese models? Quite a bit cheaper, but maybe still too expensive?
jwardbond 44 minutes ago | parent
a) It was way too easy to just auto-approve everything. Answering the 5-10 spec questions it asked me made me feel like I was an important part of the loop, but really it was just a way to make me feel important while spraying my slop cannon.
b) I wasn't actually learning anything, defeating the whole purpose of the internship I worked hard to get.
I am now using a workflow where the brainstorming process is the same, but I have claude write an instructional document for me to implement. It has instructions to ask me questions about what I know / want to know, to lay out the plan iteratively with lots of verification steps, and to heavily explain portions of the code that are unfamiliar to me. It's sorta like making my own custom tutorials specifically for the problem I am working on.
It's a little slower, but not too bad since it does still put whole codeblocks in the instructions. I have a much better understanding of what I am doing, I still get to enjoy learning and programming and improving, and I don't feel like a reverse centaur.
anthonyfrisby 42 minutes ago | parent
Walk coding. Walkoding, if you like.
Use a harness, create a harness if you like, then load it up in telegram and off you go. I’ve been on solo hiking trips and shipped numerous features. It means you can stay concentrated on your task, while not sitting there being bored.
It’s truly liberating, highly recommend.
tony_cannistra 28 minutes ago | parent
y0eswddl 9 minutes ago | parent
Garlef 41 minutes ago | parent
weitendorf 33 minutes ago | parent
To be clear the browser IS the harness, it's not just a browser-based UI but also the sandbox and orchestration layer. By giving LLMs deep browser access (through CDP and some special hooks) they can verify their own UIs immediately after writing them, navigate the web natively, and run commands that directly manipulate the active DOM. This creates a very tight feedback loop for UI work, but also let's you create or run browser automations, or query a site by running a javascript query on its contents, or a web page without deploying or uploading it anywhere, which is pretty powerful. What I really like is that this makes it easy to dispatch cheap models to generate and verify tons of little visualizations using svg.
Locally it's just a browser, but to manage remote instances you can either access them as tabs on any local browser, or as inline collapsible iframes. I'm trying to be cautious with the security side of it so we're not marketing it as a product yet, but would love to work with some anybody who is interested and does a lot of UI or cloud work!
I'm excited about this particular moment in tech because I think work is going to end up looking like playing Starcraft with data and AI, surrounded by rich custom media as you work, which feels really futuristic to me!
theodorewiles 22 minutes ago | parent
What I am exploring is another step to the classic 'research / plan / implement' pattern: 'research / plan / LEARN / implement' where LEARN involves the human doing AI tutoring sessions to ensure a deep understanding the concepts etc. that the LLM is planning to implement so you can refine / iterate on plans and direct the LLM in ever more effective ways. My idea is that this then compounds your human capital and reduces the occurance of 'sounds smart, doesn't work' pattern.
ArtRichards 9 minutes ago | parent
My flow state is thinking about and understanding this: am I solving a problem that needs to be solved now, for the right person?
I created this to help me understand it (project foundations + create milestones) and then bring it to reality (ship milestones).
tasoeur 8 minutes ago | parent
seanmcdirmid 6 minutes ago | parent
It is more like people (agent?) management than coding though. I'm setting up and debugging processes, rather than writing code. I spend a lot of time cursing at and arguing with the agents I'm using to set up hermetic agents (who I can't argue with obviously, but I can have conventional agents go over their logs to figure out how to improve their sandboxed-context).