Abstract analysis that is trivial for humans often stymies GPT-4o, Gemini, and Sonnet.
See full article...
See full article...
Additionally, humans generally have known shortcomings so we can account for it. I could go to medical school if I realize I don't know enough in that area to become a doctor and it's unlikely that getting an MD will cause me to forget how to make spaghetti.If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.
Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.
99% accuracy is a bit low for a calculator operating within its problem space, don't you think?I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.
Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.
To me, as you point out, this is just a test of OCR not of the AI.By the tone of the article you would think these AIs hardly see at all. I think they're doing pretty well. I guess I don't have insane expectation like some people, this is all still very early development. Sonnet 3.5 in particular is doing great.
I also disagree on the "hallucinated nonsensical answers" qualification, these are not the case at all, a circled a or c is pretty much a @ or ©, and a "9" and a "g" look pretty similar too. These errors are nothing like the so called "hallucinations" of LLM and very much make sense.
In mathematics, at least, that's the standard way of writing things. An m x n matrix has m rows and n columns, for instance. Matrix entry (i,j) refers to the entry in row i and column j.Do rows always come before columns when specifying the size of a table? I've always thought the other way, so I got confused seeing (5,4) for a table of 4 columns and 5 rows...
Meanwhile Intuit is like... "Let us have AI do your taxes!"I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. [...]
Wait, is this one of those images where if you get real close to the screen with the right zoom level, and cross your eyes just right, you see a 3d object?Soon:
Just cross your eyes and it should pop out as soon as the repeated pattern overlaps. You can use the finger in front of face trick if you aren't a weirdo like me who had like 3 books of magic eye as a kid and can cross eyes at will as a result.........Wait, is this one of those images where if you get real close to the screen with the right zoom level, and cross your eyes just right, you see a 3d object?
Thank you, mathematician!In mathematics, at least, that's the standard way of writing things. An m x n matrix has m rows and n columns, for instance. Matrix entry (i,j) refers to the entry in row i and column j.
This is essentially the opposite of the standard for drawing vectors in the Euclidean plane, R^2, but at this point we must live with slightly inconsistent notation.
Everyone knew NFTs were completely stupid. Only a select few idiots really went in, the rest was just there to turn a profit.Much like other pie-in-the-sky technologies (self-driving cars come to mind), the market enthusiasm for AI is rooted in the belief that the technology will continue to grow and get better. The tech market is built on this early buy-in mentality, so I guess it's no surprise that all the big players are throwing in on this. No one wants to get left behind if this actually materializes into something substantial. More than likely, this is going to be laid to rest next to NFTs in the tech graveyard and the industry will move on to the next hot trend in 5 years, while never acknowledging the misstep.
This is a perfect demonstration of Tesler's Theorem. AI got good enough at translation that it's not AI any more.glad you're having a good experience, but i use machine translation constantly and have found google translate to be functionally perfect for a number of years. my stuff isn't particularly technical, but going english > most western european languages rarely gets any corrections from my highly skilled human resources. i don't see much use for AI in translation, unless you want to proof read for the transposition of a technical term for a cat, a fruit or a mineral...
"It's not a schooner, it's a sailboat!"Soon:
Or because the person calling the API didn't specify high detail. There is such a thing as a manual and it must be read. It is not specified in the paper how they called the API which could easily explain the poorer performance.They kept throwing shit at the AI wall to see what stuck, but half of it missed because the AI glitched,
That right there is funny.They OBEY
Quite a few people can't do that.Soon:
You can always blame autocorrect.Fig 15 with the human researcher failure at "then" vs "than" was disappointing to me.
I wonder if AI can reliably get it right.
not really. because models are not calculators. they should be compared to, lets say, statistical classifiers and the expectation for error rate in those is not zero. the problem is that this is not the image that tech-bros are selling to general public.99% accuracy is a bit low for a calculator operating within its problem space, don't you think?
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.
Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.
I like the term "chat-wall".I wouldn't be so sure. Lots of C-level executives are dying to fire entire customer relationship/complaints management teams and substituting chatbots. Even if they give idiotic answers a lot of the time, and even if companies still need to retain humans in case a complaint needs to be escalated, it's far cheaper to put in a chat-wall to resolve simple complaints, drive away non-urgent ones, and save lots of cash that can figure in next term's books as windfall profits.
Also, as much as I hate the noise surrounding AI/LLMs, there is one area where I've found it can be a time saver, which is translation. I work in a technical field, and of course I need to revise every single translation, but if 80% of the work is done in seconds, the time I'll need to review will be far less than what I'd need to do the whole thing. Of course, I have no idea how many copyright infringements go into generating the translations (as far as I know, the documents I submit for translation are never used by the service, as we are bound by confidentiality and that is one of the terms of the service, but hey.... who's checking, right?)
Insufficient since some humans just can't make them work. Between my astigmatism and color blindness, they literally never work for me despite consistently trying all the tricks everyone says work.Soon:
I strongly disagree on "no practical applications" statement.I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.
Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.