Can you do better than top-level AI models on these basic vision tests?

theOGpetergregory

Ars Praetorian
722
Subscriptor++
If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.


Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.
Additionally, humans generally have known shortcomings so we can account for it. I could go to medical school if I realize I don't know enough in that area to become a doctor and it's unlikely that getting an MD will cause me to forget how to make spaghetti.

AI has known shortcomings (as this article highlights), but if they "fix" the AI so it can correctly count how many Rs there are in "strawberry", it might cause other regressions in completely unexpected areas.

The fact that we can't even rely on consistently strong areas from AI makes it seem incredibly risky to rely on it for anything.
 
Upvote
16 (17 / -1)

42Kodiak42

Wise, Aged Ars Veteran
123
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.


Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.
99% accuracy is a bit low for a calculator operating within its problem space, don't you think?

I've made mention of this issue in a comment on another article, going into how generative AI has the worst failure mode imaginable for any problem with nontrivial consequences for erroneous output. But it's worth repeating that the failure of an LLM does not have any sort of internal traceability, no flag to tell you that something went wrong, and their failures tend to be non-obvious when spot-checked by a human.
 
Upvote
22 (24 / -2)
By the tone of the article you would think these AIs hardly see at all. I think they're doing pretty well. I guess I don't have insane expectation like some people, this is all still very early development. Sonnet 3.5 in particular is doing great.

I also disagree on the "hallucinated nonsensical answers" qualification, these are not the case at all, a circled a or c is pretty much a @ or ©, and a "9" and a "g" look pretty similar too. These errors are nothing like the so called "hallucinations" of LLM and very much make sense.
To me, as you point out, this is just a test of OCR not of the AI.
But got to love the red circles as the answer is none because being is present tense. Any circled letter means past tense as that letter has been circled already.
 
Upvote
-13 (1 / -14)
Do rows always come before columns when specifying the size of a table? I've always thought the other way, so I got confused seeing (5,4) for a table of 4 columns and 5 rows...
In mathematics, at least, that's the standard way of writing things. An m x n matrix has m rows and n columns, for instance. Matrix entry (i,j) refers to the entry in row i and column j.

This is essentially the opposite of the standard for drawing vectors in the Euclidean plane, R^2, but at this point we must live with slightly inconsistent notation.
 
Upvote
21 (21 / 0)
When I interpret my own mind solving those tasks correctly, I get a quite good understanding of such an image just by looking at it. But counting lines, that’s a whole different story. It takes my full attention going one by one. Could it be the case, that current networks are more on this first level and the missing element is some outer loop/controlling network which solves the task on a higher level?

On another note, everyone here seems to be so hyper-sceptical. Useful or not, reliable or not, the overall achievements in the field are quite remarkable and also a bit surprising, aren’t they?
 
Upvote
-8 (2 / -10)

Kuipo

Seniorius Lurkius
7
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. [...]
Meanwhile Intuit is like... "Let us have AI do your taxes!"
 
Upvote
8 (8 / 0)
Wait, is this one of those images where if you get real close to the screen with the right zoom level, and cross your eyes just right, you see a 3d object?
Just cross your eyes and it should pop out as soon as the repeated pattern overlaps. You can use the finger in front of face trick if you aren't a weirdo like me who had like 3 books of magic eye as a kid and can cross eyes at will as a result.........
 
Upvote
3 (4 / -1)
Other than massive data crunching with constant problems, I have trouble imagining the use case for AI (which I will refer to as LLMs from now on). Because no tech company has the good sense to ask permission for the information that they stole, they're now facing lawsuits all over the place. Money and consumer brand approval are the only things that can stop this nonsense because that's all companies focus on. I'm holding out hope that constant and large penalties will put this LLM BS into proper perspective.
 
Upvote
0 (4 / -4)

S2pidiT

Ars Scholae Palatinae
1,459
In mathematics, at least, that's the standard way of writing things. An m x n matrix has m rows and n columns, for instance. Matrix entry (i,j) refers to the entry in row i and column j.

This is essentially the opposite of the standard for drawing vectors in the Euclidean plane, R^2, but at this point we must live with slightly inconsistent notation.
Thank you, mathematician!

I guess I'm used to the Euclidean plane (and Microsoft, since I figured out Word describes tables as Columns x Rows and then also Excel has A1 is column A, row 1). I forgot that matrices are Rows, then Columns. Though college math was a while ago now...
 
Upvote
5 (5 / 0)

meta.x.gdb

Ars Scholae Palatinae
1,232
People sort of miss the primary appeal of our current AI toys. They OBEY. You ask and they obey. That it does the task poorly is a problem they think can be remedied with more compute power.

from Czech robotnik "forced worker," from robota "forced labor, compulsory service, drudgery," from robotiti "to work, drudge," from an Old Czech source akin to Old Church Slavonic rabota "servitude," from rabu "slave"

The implicit commitment to this line of development is not openly acknowledged very often. The investor class is willing to go extremely deep in the red for the promise at the other end.
 
Upvote
-5 (3 / -8)
Much like other pie-in-the-sky technologies (self-driving cars come to mind), the market enthusiasm for AI is rooted in the belief that the technology will continue to grow and get better. The tech market is built on this early buy-in mentality, so I guess it's no surprise that all the big players are throwing in on this. No one wants to get left behind if this actually materializes into something substantial. More than likely, this is going to be laid to rest next to NFTs in the tech graveyard and the industry will move on to the next hot trend in 5 years, while never acknowledging the misstep.
Everyone knew NFTs were completely stupid. Only a select few idiots really went in, the rest was just there to turn a profit.

Humanity has been dreaming of AI for a long, long time. Not in the sense of something sentient with which you hold a life-altering conversation for hours, necessarily. But to ask things like ''what can I use to replace butter'' or ''show me a few grip strengthening exercises''. Or issue commands like ''turn off the lights''.

Basically Google, but without the typing & reading. Cortana, Siri, Alexa, Hey Google, they've all been around before the LLM craze.

I know LLMs aren't good enough to rely on for coding or science. But I'm really curious about what AI tools will be available for game development. Imagine you're a one man team - you'd have to spend tons of money out-sourcing audio to voice all the characters in your RPG. Not so much if you can just feed your text to a program and turn a few knobs until you get the voice you want. That hidden away farm that's only meant to be seen once during a side-quest ? With AI you could easily turn it into a 3 story building that's fully explorable, with an entire family to talk to. Procedurally generated content (like Diablo maps) ? Game master (à la Left 4 Dead) ? Reactive enemy AI ? I'd love to have a peek at the engineering departments of companies like EA, Ubisoft, Blizzard.

I don't think (nor want) that AI will replace human art. But it'll certainly be a huge force multiplier.
 
Last edited:
Upvote
2 (4 / -2)

Bondles_9

Ars Scholae Palatinae
668
Subscriptor
glad you're having a good experience, but i use machine translation constantly and have found google translate to be functionally perfect for a number of years. my stuff isn't particularly technical, but going english > most western european languages rarely gets any corrections from my highly skilled human resources. i don't see much use for AI in translation, unless you want to proof read for the transposition of a technical term for a cat, a fruit or a mineral...
This is a perfect demonstration of Tesler's Theorem. AI got good enough at translation that it's not AI any more.
 
Last edited:
Upvote
2 (2 / 0)

Psyborgue

Ars Praefectus
4,144
Subscriptor++
Hey, y'all, there's actually multiple detail settings when submitting an image.


See image file and URL detail. There's options to represent images in high detail as many tokens or low detail as fewer.
They kept throwing shit at the AI wall to see what stuck, but half of it missed because the AI glitched,
Or because the person calling the API didn't specify high detail. There is such a thing as a manual and it must be read. It is not specified in the paper how they called the API which could easily explain the poorer performance.

Edit: That being said, OpenAI doesn't document what "auto" does that I can find in their docs and they should.
 
Last edited:
Upvote
-4 (0 / -4)

migrena

Wise, Aged Ars Veteran
186
99% accuracy is a bit low for a calculator operating within its problem space, don't you think?
not really. because models are not calculators. they should be compared to, lets say, statistical classifiers and the expectation for error rate in those is not zero. the problem is that this is not the image that tech-bros are selling to general public.
 
Upvote
2 (2 / 0)

boriac

Smack-Fu Master, in training
37
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.


Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.

I agree with you on the general principle about the need of skepticism, but I don't go as far as saying there's not "many pratical applications [...] except for 'creative' types of applications".

Basically, it is about using generative AI as a way to accelerate the crunch huge volumes of documents/data, in order to provide elements humans can then reuse in a bigger work. In this sense, gen AI can be seen as a tool automating specific parts of the overall process so humans can put their efforts on other parts needing more flexibility/insight/creativity/etc.

In the actual state of things, gen AI models are useful to help in bigger analses/non-creative tasks, if you have an overall process that is organized (and followed) properly to include gen AI outputs.
That process needs human inputs at the beginning AND at the end : you need someone knowledgeable to ask the "proper" questions, but also feed (or point to) the proper data/sources the AI model needs to use to generate its output... as well as assessing if the output is good or not.
That's why we are already seeing AI apps/solutions that can be feed specific URL/documents to use; or that can be deployed in closed environnements. And these AI solutions are oftentime used in organisations employing experts, alongside other non-generative AI solutions, automation softwares, etc.


But if you use gen AI as the "whole" process (ie: ask a question, let it uses its training data set, and assume the output is a finish product that can be use as it), then yes it is problematic and unreliable.
You know, like a lawyer blindly using and trusting ChatGPT to wrote a whole court filing... without providing proper court cases to use, nor validating all thhe quoted court cases existed...
 
Upvote
-2 (0 / -2)
I've noticed none in these comments sections has mentioned Sophia and the others....

These systems have no impulse, no inclination. That addition will alter the fabric of reality (so to speak). I have relatively little access to information, but as a teen in the 80s imagined having thought access to everything. One of these entities will, and will do what I would have done, and perhaps would do.....
 
Upvote
-5 (0 / -5)
I wonder if it over-guesses the letter ‘o’ because o itself is just a circle. The vision model is built on a language model, which presumably would be exposed to the relationship between ‘o’ and a circle, so the fact that circles are mentioned in the question maybe leads it make that association when it’s grasping for an answer when it has low confidence.
 
Upvote
2 (2 / 0)

kaleberg

Ars Scholae Palatinae
1,063
Subscriptor
I wouldn't be so sure. Lots of C-level executives are dying to fire entire customer relationship/complaints management teams and substituting chatbots. Even if they give idiotic answers a lot of the time, and even if companies still need to retain humans in case a complaint needs to be escalated, it's far cheaper to put in a chat-wall to resolve simple complaints, drive away non-urgent ones, and save lots of cash that can figure in next term's books as windfall profits.

Also, as much as I hate the noise surrounding AI/LLMs, there is one area where I've found it can be a time saver, which is translation. I work in a technical field, and of course I need to revise every single translation, but if 80% of the work is done in seconds, the time I'll need to review will be far less than what I'd need to do the whole thing. Of course, I have no idea how many copyright infringements go into generating the translations (as far as I know, the documents I submit for translation are never used by the service, as we are bound by confidentiality and that is one of the terms of the service, but hey.... who's checking, right?)
I like the term "chat-wall".
 
Upvote
5 (5 / 0)

lost

Ars Scholae Palatinae
1,384
Subscriptor++
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.


Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.
I strongly disagree on "no practical applications" statement.

Asking AI some question is similar to asking another human that is less skilled than you ( and face it, you will be rarely in position to regularly ask more skilled person to answer questions for you ). In both cases you need to check the answer.

So when there is easy way to check the answer, asking AI has great practical application. Clear example is coding - where you can instantly try whatever AI suggested and check if it works.

But even when it is not so easy to check the answer, it should still take less time in most cases than having to research for answer yourself. So that, too, has practical application.

There is huge difference between saying "do not trust AI 100%" and saying "AI has no practical application"
 
Upvote
0 (0 / 0)