39 points make_it_sure 19 hours ago 84 comments
Many won't care unless you show them an actual study.
So my question is, are there any actual studies about the companies that actually make it work with AI?
anovikov 19 hours ago | parent
dudewhocodes 18 hours ago | parent
Most of these apps are rudimentary habit trackers, time management apps etc. so not much creativity, much more recycled ideas. More code != better ideas though.
https://www.a16z.news/i/185469925/app-store-engage https://42matters.com/ios-apple-app-store-statistics-and-tre...
whstl 18 hours ago | parent
massysett 18 hours ago | parent
anovikov 14 hours ago | parent
End of 2023: 1,870,119 apps
End of 2024: 1,961,596 apps
Now: 2,150,612 apps, after 1.18 years.
+160k apps a year, that's only 84% above pre-AI era (safe to say that apps were not routinely built with AI in 2023 yet). Noticeable increase but doesn't feel dramatic, especially since yes, majority of those new apps are low-effort trash like those described in this thread.
therouwboat 18 hours ago | parent
PunchyHamster 16 hours ago | parent
There are more apps, and webpages, and software and whole lot of stuff.
It's just not good
flawn 19 hours ago | parent
austin-cheney 19 hours ago | parent
Lionga 19 hours ago | parent
vjk800 18 hours ago | parent
whstl 18 hours ago | parent
IMO the bottleneck remains the same: doing proper engineering is more than writing code. Even 20 years ago a big corp would spend a few years writing something that a startup would do in weeks (and yes: even 20 years ago) just because of laser-focused requirements, better processes/less bureaucracy, using the right tools for the job and having less friction in tooling. That hasn't changed.
graeber_28927 18 hours ago | parent
So I think it's fair to be looking at results a few years in.
Andrey Karpathy famously mentioned in an interview with Dwarkesh Patel [0], that the computer doesn't show up on GDP numbers, there's no noticeable jump or change in slope. Even if Excel is so damn fast, people are likely not drawing its full potential, and institutions are likely actively resisting change anyway.
My take is that the general population hasn't found the productive levers yet, they're at the stage where they're happy to drag down and auto generate the date list in Excel, but don't know to adjust diagrams or read function docs, not to even mention VBS scripting. And the enthusiast (dev) community I'd say is starting adoption with internal tools, and shot-in-the-dark apps, but big successes need time to mature in all the other ways (design, reliability, user feedback, marketing...), which comes back to what you said also, that needs time. Product Market Fit isn't happening automatically by chance or good prompting, I would like to think.
stephbook 17 hours ago | parent
That's certainly an interesting take. Where do these people think the 1-2% annual growth came from — steam machine late adopters?
AnimalMuppet 9 hours ago | parent
The conundrum in the 1980s and 1990s was, growth hasn't increased, despite all the computer adaptation. Why not?
chrisjj 18 hours ago | parent
Beats me. With "AI" being so good at faking stuff, there should by now be ton of such studies :)
IshKebab 18 hours ago | parent
There are a mountain of things that we reasonably know to be true but haven't done studies on. Is it beneficial for programming languages to support comments? Are regexes error-prone? Does static typing improve productivity on large projects? Is distributed version control better than centralised (lock based)? Etc.
Also you can't just say "AI improves productivity". What kind of AI? What are you using it for? If you're making static landing pages... yeah obviously it's going to help. Writing device drivers in Ada? Not so much.
teew 18 hours ago | parent
AugustoCAS 18 hours ago | parent
The gains are ~17% increase in individual effectiveness, but a ~9% of extra instability.
In my experience using AI assisted coding for a bit longer than 2 years, the benefit is close to what Dora reported (maybe a bit higher around 25%). Nothing close to an average of 2x, 5x, 10x. There's a 10x in some very specific tasks, but also a negative factor in others as seemingly trivial, but high impact bugs get to production that would have normally be caught very early in development on in code reviews.
Obviously depends what one does. Using AI to build a UI to share cat pictures has a different risk appetite than building a payments backend.
lucasluitjes 18 hours ago | parent
That 17% increase is in self-reported effectiveness. The software delivery throughput only went up 3%, at a cost of that 9% extra instability. So you can build 3% faster with 9% more bugs, if I'm reading those numbers right.
yorwba 17 hours ago | parent
The question that people are actually interested in, "After adopting this specific AI tool, will there be a noticeable impact on measures we care about?" is not addressed by this model at all, since they do not compare individual respondents' answers over time, nor is there any attempt to establish causality.
PunchyHamster 16 hours ago | parent
unsupp0rted 18 hours ago | parent
There are genuinely weeks where I go 5x though, and others where I go 0.5x.
duncanfwalker 17 hours ago | parent
muvlon 17 hours ago | parent
duncanfwalker 9 hours ago | parent
orwin 14 hours ago | parent
Three months ago, with opus4.5, I would have said that the productivity improvement was ~10% for my whole team.
I now have to contradict myself: juniors and even experienced new hires with little domain knowledge don't improve as fast as they used to. I still have to write new tasks/issue like I would have for someone we just hired, after 8 months. I still catch the same issues we caught in reviews three months ago.
Basically, experience doesn't improve productivity as fast as it used to. On easy stuff it doesn't matter (like frontend changes, the productivity gains are extremely high, probably 10x), and on specific subjects like red teaming where a quantity of small tools is better than an integrated solution I think it can be better than that.
But I'm in a netsec tooling team, we do hard automation work to solve hard engineering issues, and that is starting to be a problem if juniors don't level up fast.
AnimalMuppet 9 hours ago | parent
Nevermark 18 hours ago | parent
Those that can “see” the potential push through the adaptation period, even when longer than expected.
Depending on how forward looking a group is, the adaptation costs are a problem, a dilemma, or a completely obvious win.
Yet, external measurements don't distinguish between accumulating, accelerating, flat or fading intermediate value.
--
Avoidance of necessary adaptation, even with no immediate impact, becomes the dual. Technical, strategic, or capability debt.
Does that hidden anti-productivity ever get accounted for? When maladaptive firms take their anti-productivity into a hole as they fade/demise?
A company can operate with high margins while its sales fall off a cliff. Is that just "decreasing quantities" of uniformly "high productivity"?
heraldgeezer 18 hours ago | parent
Stronz 18 hours ago | parent
danr4 18 hours ago | parent
otabdeveloper4 18 hours ago | parent
blitzar 18 hours ago | parent
I would have included the flatness of earth, but the flat earthers have some excellent studies (reviewed by their flat earth peers) on the subject.
lysecret 18 hours ago | parent
actionfromafar 17 hours ago | parent
If we could even measure teams, against themselves, others and some kind of baseline, but we don't AFAIK.
kqr 17 hours ago | parent
We only avoid doing it at scale because it's expensive. In particular if we want the measurement to generalise out of sample.
(In particular in this case, where once we're done, proponents will claim our data is too old to be a useful guide to tomorrow.)
xigoi 15 hours ago | parent
The problem with this is that AI will create worse code that is going to cause more problems in the future, but the measurements won’t take that into account.
kqr 7 hours ago | parent
Thunderer 6 hours ago | parent
arzke 17 hours ago | parent
blitzar 17 hours ago | parent
Unironically, ai evaluating the impact of those lines might be getting close to a metric that would measure output better than having everyone print out their last 6 months of work for the new boss to look at.
PunchyHamster 16 hours ago | parent
rienbdj 18 hours ago | parent
chrysoprace 18 hours ago | parent
esperent 17 hours ago | parent
You need broad economic measurements, not individual or company specific. And that takes a long time plus there's a lot of noise in the data right now (war, for example).
bawolff 18 hours ago | parent
Why are the pro AI people so obsessed with proving the AI skeptics wrong.
Is AI is working for you? Great. Go make great things. Isn't that the point after all? Who cares who believes you if the results speak for themselves?
squidbeak 17 hours ago | parent
It seems to me the pro-AI types just want to be free to enjoy a transformative tech and discuss the implications of its development and innovations - without being badgered and henpecked or told the results they see are some kind of mass delusion.
PunchyHamster 16 hours ago | parent
You're literally trying to blame the victim. Put "don't show AI content" on every major platform and the henpecking will stop but (aside from technical annoyanced of doing it) that won't happen because companies want to force AI down our throats.
squidbeak 7 hours ago | parent
Your argument then is: "Ban the subject of AI from your platforms or we're coming at you with pitchforks. And don't say anything to us when we do, because we are the sad ones here." Correct?
sph 14 hours ago | parent
Cognitive dissonance. "Why are people claiming they do not see any benefit and I do? That is unacceptable, they must be wrong."
I have to admit cognitive dissonance works both ways.
smackeyacky 18 hours ago | parent
charcircuit 17 hours ago | parent
re-thc 17 hours ago | parent
It’s not. In a proper org the cost is the testing, the release process, the coordination, the planning, etc.
Any scope creep even if it fixes something often gets shouted at.
charcircuit 17 hours ago | parent
IdontKnowRust 17 hours ago | parent
sph 14 hours ago | parent
Learning to write code always was the easy part, learning to write good software is what takes the rest of our careers to get better at.
charcircuit 17 hours ago | parent
If you are familiar with AI it's obvious how it increases productivity. When bugs get fixed with 0 human time it's plain as day that it was productive compared to a human making the fix.
aragilar 17 hours ago | parent
mikkupikku 17 hours ago | parent
I don't know man, could just be in my head. I better defer judgement, put aside all my own opinions about what happened and let some researchers with god knows what axe to grind make that decision for me.
make_it_sure 15 hours ago | parent
ltning 17 hours ago | parent
If anything, there needs to be studies done on
- the drop in creative, novel output from actual people (due to theft and loss of jobs)
- the energy cost per pax in relevant industries, pre/post LLMs being adopted
mikkupikku 17 hours ago | parent
ltning 17 hours ago | parent
And I'm not talking about climate or poor starving artists here. But of course, if everyone thinks like you seem to do we might just give up on having a livable planet in 50 years. Or any significant scientific or artistic progress.
mikkupikku 17 hours ago | parent
PunchyHamster 16 hours ago | parent
mikkupikku 16 hours ago | parent
Also, this slop is substantially slicker and more polished than the software I would have made myself, for myself. Judge away, but when I write something myself, for myself, I take short cuts and find little excuses to give myself less work. XDG complaint config? That can wait... Animations? Pfft, skip it. Tool tips on every interactive element? That'll never happen. But with a coding agent doing my bidding, these niceties become realities.
shawntwin 17 hours ago | parent
metalman 17 hours ago | parent
ChicagoDave 17 hours ago | parent
- built AWS dashboard to identify and manage internal resources in a few hours
- solved several production problems connecting Claude to devops APIs in near real-time
- identified solutions for feature requests or bugs for existing internal applications including detailed source changes
- built Ledga.us
- built sharpee.net and its associated GitHub repo
- building mach9 poker ios and android apps
- working on undisclosed app that might disrupt a huge Internet sector
We’re still in the early stages of LLM influenced development and reporting productivity will take time
ghostlyInc 17 hours ago | parent
Things like generating boilerplate, quick test scaffolding or documentation lookups. Each one is small, but they compound during the day.
That’s probably why it’s hard to capture in traditional studies.
Curious: has anyone seen studies measuring task-level productivity instead of overall output?
felipeerias 17 hours ago | parent
In practice, arriving at this ideal scenario can be very challenging. Actually feasible experiments will be necessarily narrow, with the expectation that their results can be (roughly) extrapolated outside of their specific experimental setup.
Another valid approach would be to carry out qualitative research, for example a case study. This typically requires the study of one (or a few) developers and their specific contexts in great detail. The idea is that a deep understanding of how one person navigates their work and their tools would provide us with insights that might be related to our specific situation.
Personally, in this particular area, I tend to prefer detailed qualitative accounts of how other developers are working on similar projects and with similar tools as me.
But in any case, both approaches are valid and complementary.
lnsru 17 hours ago | parent
hypeatei 17 hours ago | parent
hennell 17 hours ago | parent
Which is the issue with almost all studies and statistics, what it means depends entirely on what you're measuring.
I can program very very fast if I only consider the happy path, hard code everything and don't bother with things like writing tests defining types or worrying about performance under expected scale. It's all much faster right up until the point it isn't - and then it's much slower. Ai isn't quite so obviously bad, but it can still hide short term gains into long term problems which is what studies tend to focus on as the short term doesn't usually require a study to observe.
I think Ai is similar to outsourcing staff to cheeper counties, replacing ingredients with cheaper alternatives and other MBA style ideas. It's almost always instantly beneficial, but the long term issues are harder to predict, and can have far more varied outcomes dependent on weird specifics of the business.
jokoon 16 hours ago | parent
It's all make believe
andrewstuart 16 hours ago | parent
And no, no-one is waiting for a “study” to believe in AI, they’re out doing it.
PunchyHamster 16 hours ago | parent
Note that most of them were focused on programming tasks aimed to ship a product, not other use cases like "prototype a dozen of ideas quickly before we pick direction", or "write/update documentation about this feature" which AI might be significantly more productive use case than just programming.
devilkin 15 hours ago | parent
Yet I have to still see the first delivery or codebase by that same person. (I am not his manager)
I lean in the LLM skeptic camp, I know they're great for some things (never to outsource your thinking, what unfortunately a lot of people do), but I'd like to see some studies. Because there are a lot of net negatives in the business press, or max up to 10% improvement.
eudamoniac 13 hours ago | parent
edanm 5 hours ago | parent
2. That said, almost all the people who "want to see a study" don't make sense to me. I don't remember anyone insisting on seeing a study that shows that writing Python is more productive than C; people just used it and largely agreed that it was. How many studies show that git (or other DVCS) are better than the things that preceded it? I don't know if any exist. I do know that nobody was looking for studies before switching to git.
I don't ever remember seeing any new technology in software development for which people demanded studies before adopting it. They just assumed that if the professional developers they trusted to build their software said something was better, then it was — a correct assumption IMO.
Now, we're seeing a technology which most professional developers — that have used it seriously, at least — insist is orders of magnitude better than anything else that's come before it. And suddenly developers can't be trusted? Suddenly, when the claimed effect is orders of magnitude bigger than almost any other new technology, developers are biased and incapable of making this kind of determination?
I really don't think that's a serious position to hold.
000ooo000 4 hours ago | parent
You can't just assert this. I could equally-baselessly say most professional developers have used LLMs and find them, overall, more trouble than they're worth. Except it's not totally baseless because I think that was actually a result of a study, IIRC.
000ooo000 1 hour ago | parent
>I'm [...] at $x, a frontier AI Security company
I really should check these before I bother engaging with posts boosting AI
khuedoan 3 hours ago | parent
In C vs. Python case, we know about technical trade-offs and when to use what, but in AI productivity neratives, we keep pretending that technical or cognitive debt created by AI doesn't exist.
Sure, person A can be 20% "faster" and suggest that this tool increases productivity by a magnitude, but if it costs person B 50% more time to review A's slop or clean up A's mess, the team's productivity doesn't really increase.