Model Evaluation and Threat Research is an AI research charity that looks into the threat of AI agents! That sounds a bit AI doomsday cult, and they take funding from the AI doomsday cult organisat…
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
Also small number of participants (16) , the participants were familiar with the code base and all tasks seems to be smaller in completion time can screw results.
Thus the divergence between studio results and many people personal experience that would experience increase of productivity because they are doing different tasks in a different scenario.
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.
The study was centered on bugfixing large established projects. This task is not really the one that AI helpers excel at.
Also small number of participants (16) , the participants were familiar with the code base and all tasks seems to be smaller in completion time can screw results.
Thus the divergence between studio results and many people personal experience that would experience increase of productivity because they are doing different tasks in a different scenario.
Call me crazy but I think developers should understand what they’re working on, and using LLM tools doesn’t provide a shortcut there.
“AI is good for Hello World projects written in javascript.”
Managers will still fire real engineers though.
I find it more useful doing large language transformations and delving into unknown patterns, languages or environments.
If I know a source head to toe, and I’m proficient with that environment, it’s going to offer little help. Specially if it’s a highly specialized problem.
Since SVB crash there have been firings left and right. I suspect AI is only an excuse for them.
Same experience here, performance is mediocre at best on an established code base. Recall tends to drop sharply as the context expands leading to a lot of errors.
I’ve found coding agents to be great at bootstrapping projects on popular stacks, but once you reach a certain size it’s better to either make it work on isolated files, or code manually and rely on the auto complete.
So far I’ve only found it useful when describing bite-sized tasks in order to get suggestions on which functions are useful from the library/API I’m using. And only when those functions have documentation available on the Internet.