Can you do better than top-level AI models on these basic vision tests?

theOGpetergregory · Jul 11, 2024

Tundrok said:
If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.

Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.

Additionally, humans generally have known shortcomings so we can account for it. I could go to medical school if I realize I don't know enough in that area to become a doctor and it's unlikely that getting an MD will cause me to forget how to make spaghetti.

AI has known shortcomings (as this article highlights), but if they "fix" the AI so it can correctly count how many Rs there are in "strawberry", it might cause other regressions in completely unexpected areas.

The fact that we can't even rely on consistently strong areas from AI makes it seem incredibly risky to rely on it for anything.

42Kodiak42 · Jul 11, 2024

Tundrok said:
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.

Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.

99% accuracy is a bit low for a calculator operating within its problem space, don't you think?

I've made mention of this issue in a comment on another article, going into how generative AI has the worst failure mode imaginable for any problem with nontrivial consequences for erroneous output. But it's worth repeating that the failure of an LLM does not have any sort of internal traceability, no flag to tell you that something went wrong, and their failures tend to be non-obvious when spot-checked by a human.

Sheep Disorder · Jul 11, 2024

bugsbony said:
By the tone of the article you would think these AIs hardly see at all. I think they're doing pretty well. I guess I don't have insane expectation like some people, this is all still very early development. Sonnet 3.5 in particular is doing great.

I also disagree on the "hallucinated nonsensical answers" qualification, these are not the case at all, a circled a or c is pretty much a @ or ©, and a "9" and a "g" look pretty similar too. These errors are nothing like the so called "hallucinations" of LLM and very much make sense.

To me, as you point out, this is just a test of OCR not of the AI.
But got to love the red circles as the answer is none because being is present tense. Any circled letter means past tense as that letter has been circled already.

mathematician · Jul 11, 2024

S2pidiT said:
Do rows always come before columns when specifying the size of a table? I've always thought the other way, so I got confused seeing (5,4) for a table of 4 columns and 5 rows...

In mathematics, at least, that's the standard way of writing things. An m x n matrix has m rows and n columns, for instance. Matrix entry (i,j) refers to the entry in row i and column j.

This is essentially the opposite of the standard for drawing vectors in the Euclidean plane, R^2, but at this point we must live with slightly inconsistent notation.

phrankster2000 · Jul 11, 2024

When I interpret my own mind solving those tasks correctly, I get a quite good understanding of such an image just by looking at it. But counting lines, that’s a whole different story. It takes my full attention going one by one. Could it be the case, that current networks are more on this first level and the missing element is some outer loop/controlling network which solves the task on a higher level?

On another note, everyone here seems to be so hyper-sceptical. Useful or not, reliable or not, the overall achievements in the field are quite remarkable and also a bit surprising, aren’t they?

Kenjitsuka · Jul 11, 2024

Don't call it "hallucinations"! That's a very serious human only word. Benj course corrected.

moosemaimer · Jul 11, 2024

Soon:

To prove you are a human, type a one-word description of the image.

Kuipo · Jul 11, 2024

Tundrok said:
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. [...]

Meanwhile Intuit is like... "Let us have AI do your taxes!"

42Kodiak42 · Jul 11, 2024

moosemaimer said:
Soon:

Wait, is this one of those images where if you get real close to the screen with the right zoom level, and cross your eyes just right, you see a 3d object?

Lonyo · Jul 11, 2024

Fig 15 with the human researcher failure at "then" vs "than" was disappointing to me.

I wonder if AI can reliably get it right.

mir-teiwaz · Jul 11, 2024

42Kodiak42 said:
Wait, is this one of those images where if you get real close to the screen with the right zoom level, and cross your eyes just right, you see a 3d object?

Just cross your eyes and it should pop out as soon as the repeated pattern overlaps. You can use the finger in front of face trick if you aren't a weirdo like me who had like 3 books of magic eye as a kid and can cross eyes at will as a result.........

ChronoReverse · Jul 11, 2024

You don't need to cross your eyes. Just unfocus them a bit (i.e., look behind the image) and let the images drift across each other. This is the actual intended way to see the 3d image since crossing your eyes reverses the depth of the 3d effect.

Hopefully Smarter · Jul 11, 2024

My take on AI is as follows:

The internet is total bullshit
AI is a product of the internet
2 =1 or 1=2

Heterocephalus glaber · Jul 11, 2024

Other than massive data crunching with constant problems, I have trouble imagining the use case for AI (which I will refer to as LLMs from now on). Because no tech company has the good sense to ask permission for the information that they stole, they're now facing lawsuits all over the place. Money and consumer brand approval are the only things that can stop this nonsense because that's all companies focus on. I'm holding out hope that constant and large penalties will put this LLM BS into proper perspective.

S2pidiT · Jul 11, 2024

mathematician said:
In mathematics, at least, that's the standard way of writing things. An m x n matrix has m rows and n columns, for instance. Matrix entry (i,j) refers to the entry in row i and column j.

This is essentially the opposite of the standard for drawing vectors in the Euclidean plane, R^2, but at this point we must live with slightly inconsistent notation.

Thank you, mathematician!

I guess I'm used to the Euclidean plane (and Microsoft, since I figured out Word describes tables as Columns x Rows and then also Excel has A1 is column A, row 1). I forgot that matrices are Rows, then Columns. Though college math was a while ago now...

meta.x.gdb · Jul 11, 2024

People sort of miss the primary appeal of our current AI toys. They OBEY. You ask and they obey. That it does the task poorly is a problem they think can be remedied with more compute power.

from Czech robotnik "forced worker," from robota "forced labor, compulsory service, drudgery," from robotiti "to work, drudge," from an Old Czech source akin to Old Church Slavonic rabota "servitude," from rabu "slave"

The implicit commitment to this line of development is not openly acknowledged very often. The investor class is willing to go extremely deep in the red for the promise at the other end.

that guy strife · Jul 11, 2024

Ubersoldat19 said:
Much like other pie-in-the-sky technologies (self-driving cars come to mind), the market enthusiasm for AI is rooted in the belief that the technology will continue to grow and get better. The tech market is built on this early buy-in mentality, so I guess it's no surprise that all the big players are throwing in on this. No one wants to get left behind if this actually materializes into something substantial. More than likely, this is going to be laid to rest next to NFTs in the tech graveyard and the industry will move on to the next hot trend in 5 years, while never acknowledging the misstep.

Everyone knew NFTs were completely stupid. Only a select few idiots really went in, the rest was just there to turn a profit.

Humanity has been dreaming of AI for a long, long time. Not in the sense of something sentient with which you hold a life-altering conversation for hours, necessarily. But to ask things like ''what can I use to replace butter'' or ''show me a few grip strengthening exercises''. Or issue commands like ''turn off the lights''.

Basically Google, but without the typing & reading. Cortana, Siri, Alexa, Hey Google, they've all been around before the LLM craze.

I know LLMs aren't good enough to rely on for coding or science. But I'm really curious about what AI tools will be available for game development. Imagine you're a one man team - you'd have to spend tons of money out-sourcing audio to voice all the characters in your RPG. Not so much if you can just feed your text to a program and turn a few knobs until you get the voice you want. That hidden away farm that's only meant to be seen once during a side-quest ? With AI you could easily turn it into a 3 story building that's fully explorable, with an entire family to talk to. Procedurally generated content (like Diablo maps) ? Game master (à la Left 4 Dead) ? Reactive enemy AI ? I'd love to have a peek at the engineering departments of companies like EA, Ubisoft, Blizzard.

I don't think (nor want) that AI will replace human art. But it'll certainly be a huge force multiplier.

Bondles_9 · Jul 11, 2024

ArsScene said:
glad you're having a good experience, but i use machine translation constantly and have found google translate to be functionally perfect for a number of years. my stuff isn't particularly technical, but going english > most western european languages rarely gets any corrections from my highly skilled human resources. i don't see much use for AI in translation, unless you want to proof read for the transposition of a technical term for a cat, a fruit or a mineral...

This is a perfect demonstration of Tesler's Theorem. AI got good enough at translation that it's not AI any more.

Mentil · Jul 12, 2024

moosemaimer said:
Soon:

"It's not a schooner, it's a sailboat!"

YetAnotherAnonymousAppellation · Jul 12, 2024

As long as their vision is good enough to find the centre of mass on a human, that's enough for them to be able to take us out. Actual thinking capability is highly overrated in military situations.

Psyborgue · Jul 12, 2024

Hey, y'all, there's actually multiple detail settings when submitting an image.

https://platform.openai.com/docs/api-reference/messages/createMessage#messages-createmessage-content

See image file and URL detail. There's options to represent images in high detail as many tokens or low detail as fewer.

Fatesrider said:
They kept throwing shit at the AI wall to see what stuck, but half of it missed because the AI glitched,

Or because the person calling the API didn't specify high detail. There is such a thing as a manual and it must be read. It is not specified in the paper how they called the API which could easily explain the poorer performance.

Edit: That being said, OpenAI doesn't document what "auto" does that I can find in their docs and they should.

Psyborgue · Jul 12, 2024

meta.x.gdb said:
They OBEY

That right there is funny.

moosemaimer said:
Soon:

Quite a few people can't do that.

Psyborgue · Jul 12, 2024

Lonyo said:
Fig 15 with the human researcher failure at "then" vs "than" was disappointing to me.

I wonder if AI can reliably get it right.

You can always blame autocorrect.

migrena · Jul 12, 2024

42Kodiak42 said:
99% accuracy is a bit low for a calculator operating within its problem space, don't you think?

not really. because models are not calculators. they should be compared to, lets say, statistical classifiers and the expectation for error rate in those is not zero. the problem is that this is not the image that tech-bros are selling to general public.

ZPedro · Jul 12, 2024

Well, step aside photos contaning street and road elements to identify, because I now know what the next generation of CAPTCHAs is going to look like…

boriac · Jul 12, 2024

Tundrok said:
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.

Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.

I agree with you on the general principle about the need of skepticism, but I don't go as far as saying there's not "many pratical applications [...] except for 'creative' types of applications".

Basically, it is about using generative AI as a way to accelerate the crunch huge volumes of documents/data, in order to provide elements humans can then reuse in a bigger work. In this sense, gen AI can be seen as a tool automating specific parts of the overall process so humans can put their efforts on other parts needing more flexibility/insight/creativity/etc.

In the actual state of things, gen AI models are useful to help in bigger analses/non-creative tasks, if you have an overall process that is organized (and followed) properly to include gen AI outputs.
That process needs human inputs at the beginning AND at the end : you need someone knowledgeable to ask the "proper" questions, but also feed (or point to) the proper data/sources the AI model needs to use to generate its output... as well as assessing if the output is good or not.
That's why we are already seeing AI apps/solutions that can be feed specific URL/documents to use; or that can be deployed in closed environnements. And these AI solutions are oftentime used in organisations employing experts, alongside other non-generative AI solutions, automation softwares, etc.

But if you use gen AI as the "whole" process (ie: ask a question, let it uses its training data set, and assume the output is a finish product that can be use as it), then yes it is problematic and unreliable.
You know, like a lawyer blindly using and trusting ChatGPT to wrote a whole court filing... without providing proper court cases to use, nor validating all thhe quoted court cases existed...

BrickOfReality · Jul 12, 2024

S u b d e r m a t o g l y p h i c

1 2 3 4 5 6 7 8 m 8 7 6 5 4 3 2 1

Doesn't anybody check their own work these days?

The Angel Gabriel (GCU) · Jul 12, 2024

I've noticed none in these comments sections has mentioned Sophia and the others....

These systems have no impulse, no inclination. That addition will alter the fabric of reality (so to speak). I have relatively little access to information, but as a teen in the 80s imagined having thought access to everything. One of these entities will, and will do what I would have done, and perhaps would do.....

ZoonVanZaal · Jul 12, 2024

I wonder if it over-guesses the letter ‘o’ because o itself is just a circle. The vision model is built on a language model, which presumably would be exposed to the relationship between ‘o’ and a circle, so the fact that circles are mentioned in the question maybe leads it make that association when it’s grasping for an answer when it has low confidence.

kaleberg · Jul 12, 2024

daemonios said:
I wouldn't be so sure. Lots of C-level executives are dying to fire entire customer relationship/complaints management teams and substituting chatbots. Even if they give idiotic answers a lot of the time, and even if companies still need to retain humans in case a complaint needs to be escalated, it's far cheaper to put in a chat-wall to resolve simple complaints, drive away non-urgent ones, and save lots of cash that can figure in next term's books as windfall profits.

Also, as much as I hate the noise surrounding AI/LLMs, there is one area where I've found it can be a time saver, which is translation. I work in a technical field, and of course I need to revise every single translation, but if 80% of the work is done in seconds, the time I'll need to review will be far less than what I'd need to do the whole thing. Of course, I have no idea how many copyright infringements go into generating the translations (as far as I know, the documents I submit for translation are never used by the service, as we are bound by confidentiality and that is one of the terms of the service, but hey.... who's checking, right?)

I like the term "chat-wall".

Nilt · Jul 12, 2024

moosemaimer said:
Soon:

Insufficient since some humans just can't make them work. Between my astigmatism and color blindness, they literally never work for me despite consistently trying all the tricks everyone says work.

lost · 2024-08-01T08:05:25-0400

Tundrok said:
I've yet to encounter generative AI models that don't require the user to be highly skeptical of all results it gives, full stop. Unless the accuracy is 99% or higher, I just don't see many practical applications for it, except for 'creative' types of applications where inaccuracy could perhaps even be considered a benefit in some contexts. If you have to second guess an AI application's output at all times, it's just not a real solution. If you argue that human error in a wide array of tasks is higher than AI model accuracy, that's not really a good argument given that most humans can be pretty easily course corrected to prevent future, similar mistakes, but you can't really do that effectively with AI models. You can't even get to the 'root cause' of the 'error' and you can't effectively ensure future errors won't occur.

Simply put: the current obsession with generative AI solutions is a huge mistake and there's really no indicator that the core shortcomings that I point to can be solved anytime in the near future.

I strongly disagree on "no practical applications" statement.

Asking AI some question is similar to asking another human that is less skilled than you ( and face it, you will be rarely in position to regularly ask more skilled person to answer questions for you ). In both cases you need to check the answer.

So when there is easy way to check the answer, asking AI has great practical application. Clear example is coding - where you can instantly try whatever AI suggested and check if it works.

But even when it is not so easy to check the answer, it should still take less time in most cases than having to research for answer yourself. So that, too, has practical application.

There is huge difference between saying "do not trust AI 100%" and saying "AI has no practical application"

Can you do better than top-level AI models on these basic vision tests?

Ars Praetorian

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Seniorius Lurkius

Seniorius Lurkius

Ars Praetorian

Ars Scholae Palatinae

Seniorius Lurkius

Wise, Aged Ars Veteran

Ars Praefectus

Ars Centurion

Ars Tribunus Militum

Ars Praefectus

Wise, Aged Ars Veteran

Ars Scholae Palatinae

Ars Scholae Palatinae

Ars Centurion

Ars Scholae Palatinae

Ars Praetorian

Ars Tribunus Militum

Ars Praefectus

Ars Praefectus

Ars Praefectus

Wise, Aged Ars Veteran

Smack-Fu Master, in training

Smack-Fu Master, in training

Seniorius Lurkius

Account Banned

Seniorius Lurkius

Ars Scholae Palatinae

Ars Legatus Legionis

Ars Scholae Palatinae