tl;dr: Query planners are a big deal.
In general, before an RDBMS backend executes a query,
it first plans the query.
There's more than one way to access the data,
and the planner evaluates some of those ways
to estimate which will be fastest.
- How is a sql query processed when there are indexes, that is, would this behavior change?
Given a "SELECT * FROM foo WHERE ..." query,
by default the plan would be to simply tablescan foo.
This is no different from $ grep ... foo.csv
,
requiring us to drag all the blocks off the
disk and into RAM so we can examine them.
Adding an index may offer a second access path,
if the WHERE conditional can be evaluated using just that index.
Retrieving < 1% of rows via b-tree index is quicker than tablescan.
(Some
indexes cover more than one column,
handy if the conditional mentions multiple columns.)
Adding more indexes can lead to an exponential explosion
of access paths that the planner might consider.
So, "would the behavior change?"
Yes, it would. We might find good
selectivity
and choose to filter a conjunct by probing
an index, rather than by resorting to tablescan.
- it also applies the same in MySQL, is that correct?
Yes, if you look under the hood of any mature RDBMS,
you will find a query planner that examines several
access paths, estimates their costs, and uses the
winning plan to execute the query.
- Do we have two logical query processings, one with indexes and one without indexes?
Much more than that.
Each time we add an index, each time we add a JOIN, each time
we add another condition to an ON or WHERE clause,
we multiply the number of potential access paths that
the planner might wish to consider.
There is an old and deep literature on query planning.
To better understand how your database operates
on your data, start habitually asking for EXPLAIN PLAN
steps and statistics. Create a table and then explain
a query. Add an index and explain again.
Then another index.
Then change the conjunct in your WHERE clause.
It's worth noting that plans are only interesting for
tables that have significantly more than a hundred rows.
And the distribution of values in your data will make
a difference to the plan, since the backend will try
to estimate the selectivity of each conjunct.
For teaching purposes we often manipulate just
half a dozen rows, but when examining plan choices
we need far more data than that, and it should
be somewhat realistic data.