Short explanation
This is a known issue for some set of users (particularly active users, unfortunately). We're working a major review queue overhaul (Yaakov is heading up this effort).
Longer explanation
Review queues are an exercise in trade-offs. Mainly, with respect to performance and speed vs. accuracy. Each review queue has its own quirks but the formula is all the same. Every 5 minutes we run some queries that take eligible items for the queue and add them, then we also diff what's not eligible (e.g. the score went up, it was already closed, etc.) and remove those. This phase relies on basic logic with a certain set of criteria per queue. It is not user specific. It's only item specific.
The above process maintains our queues and counts may fluctuate over time dynamically as events happen, but they're only global counts. Again, they're not per-user. These counts power the top bar and dots, because we need to access them on every page load for all logged-in users.
Now when you go to an actual review queue, the data is per-user. For example, in the close vote queue if you've already voted to close a question you wouldn't see that question in the queue. This is logic we need to apply when fetching the next item you are eligible to review (that reads backwards: more directly, we're really filter out what you're not eligible to review).
Overall, the accurate version is too expensive to run on every page load for the top bar...so we can't do that.
Future
One of the things we're doing some tech discovery and planning on now (Yaakov's leading this and I'm helping design the backend) is replacing the fundamentals of how review queues work. The "queue" (it's not really a queue) living in SQL and filtered there with atomic concerns (e.g. many people concurrently reviewing an item) being handled by Redis isn't ideal. The criteria for queues and who's eligible for what has gotten complicated over the years and the queries are pretty damn ugly. We need a new approach. Things like an event-based system when possible to enter questions into the queue when they're closed (instead of on a 5 minute job) and similar types of event triggers based on queue types is a lot more scalable, less duplicated/wasted effort, and let's us extend and make better some things.
I can't promise the count situation will get better for your case at this point based on the exclusion of eligibility factors, but the counts will be better for the exhaustion-not-yet-reflected and similar scenarios since they'll just generally be more real-time counts overall.