Monday 12 November 2018

Mysql: What is the difference between UNION and UNION ALL?

What is the difference between UNION and UNION ALL?

 Answers


UNION removes duplicate records (where all columns in the results are the same), 
UNION ALL does not.
There is a performance hit when using UNION instead of UNION ALL
since the database server must do additional work to remove the duplicate rows,
but usually you do not want the duplicates (especially when developing reports).

UNION Example:

SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)

UNION ALL example:

SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)



UNION removes duplicates, whereas UNION ALL does not.
In order to remove duplicates the result set must be sorted, and this may 
have an impact on the performance of the UNION, depending on the 
volume of data being sorted, and the settings of various RDBMS 
parameters ( For Oracle PGA_AGGREGATE_TARGET with 
WORKAREA_SIZE_POLICY=AUTO or SORT_AREA_SIZE and 
SOR_AREA_RETAINED_SIZE if WORKAREA_SIZE_POLICY=MANUAL ).
Basically, the sort is faster if it can be carried out in memory, 
but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then 
you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant" 
comment, but have insufficient reputation (points) to do so.



The basic difference between UNION and UNION ALL is union operation eliminates the duplicated rows from the result set but union all returns all rows after joining.



Just to add my two cents to the discussion here: one could
 understand the UNION operator as a pure, SET-oriented UNION 
- e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing 
twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from 
the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose 
T-SQL offers the operator UNION ALL.



Not sure that it matters which database
UNION and UNION ALL should work on all SQL Servers.
You should avoid of unnecessary UNIONs they are huge 
performance leak. As a rule of thumb use UNION ALL 
if you are not sure which to use.



(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a 
single result set.

ALL
Incorporates all rows into the results. This includes duplicates. If not specified,
 duplicate rows are removed.
UNION will take too long as a duplicate rows finding like DISTINCT
 is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
    SELECT * FROM Table1
    UNION ALL
    SELECT * FROM Table2) DT

A side effect of applying DISTINCT over results is a sorting operation on results.
UNION ALL results will be shown as arbitrary order on results But UNION
 results will be shown as ORDER BY 1, 2, 3, ..., n
 (n = column number of Tables) applied on results. You can 
see this side effect when you don't have any duplicate row.



It is good to understand with a Venn diagramm.
here is the link to the source. There is a good description.



One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.



Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
Student- ID(int), Name(varchar(50)), Email(varchar(50)), PositionID(int)
You can apply UNION or UNION ALL for those two table which have 
same number of columns. But they have different name or data type.
When you apply UNION operation on 2 tables, it neglects all 
duplicate entries(all columns value of row in a table is same of another table).
Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply UNION ALL operation on 2 tables,
 it returns all entries with duplicate(if there is any difference between any column 
value of a row in 2 tables). 
Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as 
they do additional task to remove the duplicate values. 
You can check that from Execution Estimated Time by press ctrl+L at MSSQL



If there is no ORDER BY, a UNION ALL may bring rows back as it goes, 
whereas a UNION would make you wait until the very end of the query 
before giving you the whole result set at once. This can make a difference 
in a time-out situation - a UNION ALL keeps the connection alive, 
as it were.
So if you have a time-out issue, and there's no sorting, and duplicates 
aren't an issue, UNION ALL may be rather helpful.



In very simple words the difference between UNION and UNION ALL is
 that UNION will omit duplicate records whereas UNION ALL will include
duplicate records.

0 comments:

Post a Comment