"...but Mike, will SQL perform as well as my RPG/COBOL program"?
Before I provide an answer (and it's not exactly what you think), let's explore a few things.
According to merriam-webster.com
TRUST is defined as:
- assured reliance on the character, ability, strength, or truth of someone or something
- one in which confidence is placed
And CONFIDENCE is defined as:
- a feeling or consciousness of one's powers or of reliance on one's circumstances
- faith or belief that one will act in a right, proper, or effective way
So, if you don't trust the query optimizer, what you are saying is that you don't have confidence in it's ability. If you don't have confidence in the query optimizer, you are really stating that it will not act in the right, proper or effective way. And to make matters worse, you find yourself stuck between a rock and a hard place. Because how can you make use of the latest and greatest DB2 features available only via SQL, if you don't have faith in the SQL query optimizer?
Hold on a minute... how do YOU know what the right, proper or effective way really is?
Are you as omnipresent as the query optimizer?
By the way, Merriam Webster tells us OMNIPRESENT means: present in all places at all times.
Query Optimization Defined
The query optimizer's job is to produce a plan. Producing a plan is the result of many factors. Factors that will limit or expand what is possible with regards to the strategies and methods of execution to produce the correct result in a timely manner.
Knowing and understanding these factors are key to trusting the query optimizer. Experience with how these factors affect the final plan also goes a long way in building confidence that the query optimizer has provided the best plan for your query, your data and your environment.
What is the "best" plan anyway?
Most folks don't appreciate that the query optimizer has two equally important jobs.
Job 1: build a plan that produces the CORRECT output
Job 2: build a plan that is ESTIMATED to be the fastest possible given all the factors present and accounted for
We all forget (and thus have full trust and confidence in) the optimizer's first job. We all assume the plan for our query will produce the correct output. Don't believe me? When was the last time you actually validated your query's output was correct? HA! You DO trust the optimizer don't you.
What most users focus on is Job 2. In other words, we all keep our eyes on the query run time. This is where we begin to loose faith. This is where we loose confidence with the programmer in the box.
Ah, The Programmer in the Box
Or is it The Ghost in the Machine
If we think of the query plan as a program - a program executed by the database engine and managed by the operating system, then there must be a programmer behind the scenes to design and code that program. In essence, this is exactly what the query optimizer is. So, getting back to the original question on top, you are really asking: "can I trust the programmer in the box". Let's open the box and peer inside the machine...
The programmer is highly trained, highly experienced, up-to-date on all things computer science past and present. The programmer never sleeps, is always ready. The programmer is fast - capable of designing, writing and compiling programs in milliseconds. The programmer is vigilant - watching, learning, adapting to the changing environment.
What does the programmer have to work with? What are the requirements and the inputs?
As with any programming endeavor, query optimization starts with a request, and continues with identifying the attributes of the underlying data and data structures, as well as the computing and I/O resources available.
Most users realize that the SQL statement is the request. This request, can include items such as:
- Column list projected or returned
- Source or target of query
- Constraining or identify criteria
- Grouping or aggregation criteria
- Constraining criteria on groups or aggregates
- Ordering criteria for final result set
- Constraining number of results
Most SQL users DO NOT fully understand or appreciate the data profile, the data structures, and the system's computing and I/O capabilities (speeds and feeds if you will). Yet, the programmer in the box certainly does.
Imagine a programmer that not only has all the information, but also has access to all the structures above and below the machine interface. Imagine having access to the latest computational techniques materialized in the IBM i microcode. This would be a true advantage, and hopefully would in turn inspire confidence and trust in the user.
The Key Factors
From my perspective, the 3.5 most important factors that positively or negatively affect the programmer in the box are:
1. Indexing strategy
2. Fair share of memory
3. Optimization goal
3.5. Reusable open data path (ODP)
Without an adequate indexing strategy, the query optimizer is limited in the techniques available. Not unlike the RPG or COBOL programmer who wants desperately to access one row by key (think CHAIN or random read), but no one provided the correct keyed logical file that enables the technique.
Without an adequate fair share of memory, the database engine is limited in how aggressive it can be with asynchronously reading data, and keeping that data in memory to minimize wait time. Not unlike the RPG or COBOL programmer who wants desperately to use an internal array or table, but can't due to size limitations.
Without properly communicating the optimization goal, the query optimizer is unaware of the application's fetch behavior. This can result in a plan that is best for returning all of the query results instead of a plan that returns the first subset of results. Not unlike the RPG or COBOL programmer who reads all the data and loads the entire subfile, instead of filling the first page and returning control to the user sooner than later.
Without proper data base design, good set-at-a-time coding techniques and data centric processing, the query optimizer is forced to revisit the query plan over and over. To do this, the cursor is (hard) closed and the query optimizer is called upon to ensure that the current plan is doing the right things with the right objects. Not unlike the RPG or COBOL programmer who is frequently and repeatedly tapped to end their program, revisit the design, objects and techniques - all while the user patiently waits for the program to come back on line.
Criticize Things You Don't Know About
In my experience, trust and confidence come from KNOWLEDGE, EXPERIENCE and TIME. In other words, the more knowledge you have, the more experience you have, and the more time you've spent, the more comfortable you become. This is the same with query optimization. You become wiser to the ways and means of the programmer in the box.
Those of us who have worked extensively with the DB2 for i Query Optimizer (and the fantastic programmers of the box),
It just so happens you can acquire the knowledge and initial experience with SQL query optimization by attending the very workshop dedicated to it. This class is the very best way, short of a master/apprentice relationship to get more confidence and trust in DB2 for i.
More information on the workshop can be found here.
...But Mike, Can I Trust the Query Optimizer?
My knowledge and experience tells me, "yes, you can".