What is the difference between
UNION
and UNION ALL
?Answers
UNION
removes duplicate records (where all columns in the results are the same), UNION ALL
does not.
There is a performance hit when using
UNION
instead of UNION ALL
,
since the database server must do additional work to remove the duplicate rows,
but usually you do not want the duplicates (especially when developing reports).
UNION Example:
SELECT 'foo' AS bar UNION SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
+-----+
1 row in set (0.00 sec)
UNION ALL example:
SELECT 'foo' AS bar UNION ALL SELECT 'foo' AS bar
Result:
+-----+
| bar |
+-----+
| foo |
| foo |
+-----+
2 rows in set (0.00 sec)
UNION
removes duplicates, whereas UNION ALL
does not.
In order to remove duplicates the result set must be sorted, and this may
have an impact on the performance of the UNION, depending on the
volume of data being sorted, and the settings of various RDBMS
parameters ( For Oracle
PGA_AGGREGATE_TARGET
with WORKAREA_SIZE_POLICY=AUTO
or SORT_AREA_SIZE
and SOR_AREA_RETAINED_SIZE
if WORKAREA_SIZE_POLICY=MANUAL
).
Basically, the sort is faster if it can be carried out in memory,
but the same caveat about the volume of data applies.
Of course, if you need data returned without duplicates then
you must use UNION, depending on the source of your data.
I would have commented on the first post to qualify the "is much less performant"
comment, but have insufficient reputation (points) to do so.
Just to add my two cents to the discussion here: one could
understand the
UNION
operator as a pure, SET-oriented UNION
- e.g. set A={2,4,6,8}, set B={1,2,3,4}, A UNION B = {1,2,3,4,6,8}
When dealing with sets, you would not want numbers 2 and 4 appearing
twice, as an element either is or is not in a set.
In the world of SQL, though, you might want to see all the elements from
the two sets together in one "bag" {2,4,6,8,1,2,3,4}. And for this purpose
T-SQL offers the operator
UNION ALL
.Not sure that it matters which database
UNION
and UNION ALL
should work on all SQL Servers.
You should avoid of unnecessary
UNION
s they are huge
performance leak. As a rule of thumb use
UNION ALL
if you are not sure which to use.
(From Microsoft SQL Server Book Online)
UNION [ALL]
Specifies that multiple result sets are to be combined and returned as a
ALL
Incorporates all rows into the results. This includes duplicates. If not specified,UNION
will take too long as a duplicate rows finding like DISTINCT
is applied on the results.
SELECT * FROM Table1
UNION
SELECT * FROM Table2
is equivalent of:
SELECT DISTINCT * FROM (
SELECT * FROM Table1
UNION ALL
SELECT * FROM Table2) DT
A side effect of applyingDISTINCT
over results is a sorting operation on results.
UNION ALL
results will be shown as arbitrary order on results But UNION
results will be shown as
ORDER BY 1, 2, 3, ..., n
(n = column number of Tables)
applied on results. You can
see this side effect when you don't have any duplicate row.
It is good to understand with a Venn diagramm.
here is the link to the source. There is a good description.
One more thing i would like to add-
Union:- Result set is sorted in ascending order.
Union All:- Result set is not sorted. two Query output just gets appended.
Suppose that you have two table Teacher & Student
Both have 4 Column with different Name like this
Teacher - ID(int), Name(varchar(50)), Address(varchar(50)), PositionID(varchar(50))
You can apply UNION or UNION ALL for those two table which have
same number of columns. But they have different name or data type.
When you apply
UNION
operation on 2 tables, it neglects all
duplicate entries(all columns value of row in a table is same of another table).
Like this
SELECT * FROM Student
UNION
SELECT * FROM Teacher
the result will be
When you apply
UNION ALL
operation on 2 tables,
it returns all entries with duplicate(if there is any difference between any column
value of a row in 2 tables).
Like this
SELECT * FROM Student
UNION ALL
SELECT * FROM Teacher
Output
Performance:
Obviously UNION ALL performance is better that UNION as
they do additional task to remove the duplicate values.
You can check that from Execution Estimated Time by press ctrl+L at MSSQL
If there is no
ORDER BY
, a UNION ALL
may bring rows back as it goes,
whereas a
UNION
would make you wait until the very end of the query
before giving you the whole result set at once. This can make a difference
in a time-out situation - a
UNION ALL
keeps the connection alive,
as it were.
So if you have a time-out issue, and there's no sorting, and duplicates
aren't an issue,
UNION ALL
may be rather helpful.
In very simple words the difference between UNION and UNION ALL is
that UNION will omit duplicate records whereas UNION ALL will include
duplicate records.
0 comments:
Post a Comment