Friday, 2 November 2018

Mysql: INNER JOIN ON vs WHERE clause


For simplicity, assume all relevant fields are NOT NULL.
You can do:
SELECT
    table1.this, table2.that, table2.somethingelse
FROM
    table1, table2
WHERE
    table1.foreignkey = table2.primarykey
    AND (some other conditions)
Or else:
SELECT
    table1.this, table2.that, table2.somethingelse
FROM
    table1 INNER JOIN table2
    ON table1.foreignkey = table2.primarykey
WHERE
    (some other conditions)
Do these two work on the same way in MySQL?

 Answers


INNER JOIN is ANSI syntax which you should use.
It is generally considered more readable, especially when you join lots of tables.
It can also be easily replaced with an OUTER JOIN whenever a need arises.
The WHERE syntax is more relational model oriented.
A result of two tables JOINed is a cartesian product of the tables to which a filter is applied which selects only those rows with joining columns matching.
It's easier to see this with the WHERE syntax.
As for your example, in MySQL (and in SQL generally) these two queries are synonyms.
Also note that MySQL also has a STRAIGHT_JOIN clause.
Using this clause, you can control the JOIN order: which table is scanned in the outer loop and which one is in the inner loop.
You cannot control this in MySQL using WHERE syntax.



Applying conditional statements in ON / WHERE
Here I have explained about the logical query processing steps.

Reference : Inside Microsoft® SQL Server™ 2005 T-SQL Querying 
Publisher: Microsoft Press 
Pub Date: March 07, 2006 
Print ISBN-10: 0-7356-2313-9 
Print ISBN-13: 978-0-7356-2313-2 
Pages: 640 
(8)  SELECT (9) DISTINCT (11) TOP <top_specification> <select_list>
(1)  FROM <left_table>
(3)       <join_type> JOIN <right_table>
(2)       ON <join_condition>
(4)  WHERE <where_condition>
(5)  GROUP BY <group_by_list>
(6)  WITH {CUBE | ROLLUP}
(7)  HAVING <having_condition>
(10) ORDER BY <order_by_list>
The first noticeable aspect of SQL that is different than other programming languages is the order in which the code is processed. In most programming languages, the code is processed in the order in which it is written. In SQL, the first clause that is processed is the FROM clause, while the SELECT clause, which appears first, is processed almost last.
Each step generates a virtual table that is used as the input to the following step. These virtual tables are not available to the caller (client application or outer query). Only the table generated by the final step is returned to the caller. If a certain clause is not specified in a query, the corresponding step is simply skipped.

Brief Description of Logical Query Processing Phases

Don't worry too much if the description of the steps doesn't seem to make much sense for now. These are provided as a reference. Sections that come after the scenario example will cover the steps in much more detail.
  1. FROM: A Cartesian product (cross join) is performed between the first two tables in the FROM clause, and as a result, virtual table VT1 is generated.
  2. ON: The ON filter is applied to VT1. Only rows for which the <join_condition> is TRUE are inserted to VT2.
  3. OUTER (join): If an OUTER JOIN is specified (as opposed to a CROSS JOIN or an INNER JOIN), rows from the preserved table or tables for which a match was not found are added to the rows from VT2 as outer rows, generating VT3. If more than two tables appear in the FROM clause, steps 1 through 3 are applied repeatedly between the result of the last join and the next table in the FROM clause until all tables are processed.
  4. WHERE: The WHERE filter is applied to VT3. Only rows for which the <where_condition> is TRUE are inserted to VT4.
  5. GROUP BY: The rows from VT4 are arranged in groups based on the column list specified in the GROUP BY clause. VT5 is generated.
  6. CUBE | ROLLUP: Supergroups (groups of groups) are added to the rows from VT5, generating VT6.
  7. HAVING: The HAVING filter is applied to VT6. Only groups for which the <having_condition> is TRUE are inserted to VT7.
  8. SELECT: The SELECT list is processed, generating VT8.
  9. DISTINCT: Duplicate rows are removed from VT8. VT9 is generated.
  10. ORDER BY: The rows from VT9 are sorted according to the column list specified in the ORDER BY clause. A cursor is generated (VC10).
  11. TOP: The specified number or percentage of rows is selected from the beginning of VC10. Table VT11 is generated and returned to the caller.


Therefore, (INNER JOIN) ON will filter the data (the data count of VT will be reduced here itself) before applying WHERE clause. The subsequent join conditions will be executed with filtered data which improves performance. After that only the WHERE condition will apply filter conditions.
(Applying conditional statements in ON / WHERE will not make much difference in few cases. This depends how many tables you have joined and number of rows available in each join tables)



Implicit joins (which is what your first query is known as) become much much more confusing, hard to read, and hard to maintain once you need to start adding more tables to your query. Imagine doing that same query and type of join on four or five different tables ... it's a nightmare.
Using an explicit join (your second example) is much more readable and easy to maintain.



They have a different human-readable meaning.
However, depending on the query optimizer, they may have the same meaning to the machine.
You should always code to be readable.
That is to say, if this is a built-in relationship, use the explicit join. if you are matching on weakly related data, use the where clause.



I know you're talking about MySQL, but anyway: In Oracle 9 explicit joins and implicit joins would generate different execution plans. AFAIK that has been solved in Oracle 10+: there's no such difference anymore.

0 comments:

Post a Comment