WeB LoG'S JuUiER: Should I use a #temp table or a @table variable?

Sunday, August 9, 2009

Should I use a #temp table or a @table variable?

From: http://databases.aspfaq.com/

In a stored procedure, you often have a need for storing a set of data within the procedure, without necessarily needing that data to persist beyond the scope of the procedure. If you actually need a table structure, there are basically four ways you can "store" this data: local temporary tables (#table_name), global temporary tables (##table_name), permanent tables (table_name), and table variables (@table_name).

There is a partial list of questions and answers about table variables, including some differences between table variables and #temp tables, in KB #305977 - INF: Frequently Asked Questions - SQL Server 2000 - Table Variables.

Local Temporary Tables

CREATE TABLE #people
(
id INT,
name VARCHAR(32)
)

A temporary table is created and populated on disk, in the system database tempdb — with a session-specific identifier packed onto the name, to differentiate between similarly-named #temp tables created from other sessions. The data in this #temp table (in fact, the table itself) is visible only to the current scope (usually a stored procedure, or a set of nested stored procedures). The table gets cleared up automatically when the current procedure goes out of scope, but you should manually clean up the data when you're done with it:

DROP TABLE #people

This will be better on resources ("release early") than if you let the system clean up *after* the current session finishes the rest of its work and goes out of scope.

A common use of #temp tables is to summarize/compact/reorganize data coming from another stored procedure. So, take this example, which pares down the results of the system procedure sp_who2 into only the SPID, Status, and HostName of *active* processes that are *not* part of the regular operation of the system:

CREATE TABLE #sp_who3
(
SPID INT,
Status VARCHAR(32) NULL,
Login SYSNAME NULL,
HostName SYSNAME NULL,
BlkBy SYSNAME NULL,
DBName SYSNAME NULL,
Command VARCHAR(32) NULL,
CPUTime INT NULL,
DiskIO INT NULL,
LastBatch VARCHAR(14) NULL,
ProgramName VARCHAR(32) NULL,
SPID2 INT
)

INSERT #sp_who3 EXEC sp_who2 'active'

SELECT SPID, Status, HostName FROM #sp_who3
WHERE spid > 15

DROP TABLE #sp_who3

One of the main benefits of using a #temp table, as opposed to a permanent table, is the reduction in the amount of locking required (since the current user is the only user accessing the table), and also there is much less logging involved. (You could also increase this performance by placing tempdb on a separate drive... but that's another story.)

One minor problem with #temp tables is that, because of the session-specific identifier that is tacked onto the name, the name you give it is limited to 116 characters, including the # sign (while other table types are limited to 128). If you try, you will see this:

Server: Msg 193, Level 15, State 1, Line 1
The object or column name starting with '#' is too long. The maximum length is 116 characters.

Hopefully this won't be a limitation in your environment, because I can't imagine a table name that long being useful or manageable.

Another potential problem with #temp tables is that, if you enter a transaction and use a #temp table, and then cancel without ever issuing a ROLLBACK or COMMIT, you could be causing unnecessary locks in tempdb (for more information, see KB #159747).

Global Temporary Tables

CREATE TABLE ##people
(
id INT,
name VARCHAR(32)
)

Global temporary tables operate much like local temporary tables; they are created in tempdb and cause less locking and logging than permanent tables. However, they are visible to all sessions, until the creating session goes out of scope (and the global ##temp table is no longer being referenced by other sessions). If two different sessions try the above code, if the first is still active, the second will receive the following:

Server: Msg 2714, Level 16, State 6, Line 1
There is already an object named '##people' in the database.

I have yet to see a valid justification for the use of a global ##temp table. If the data needs to persist to multiple users, then it makes much more sense, at least to me, to use a permanent table. You can make a global ##temp table slightly more permanent by creating it in an autostart procedure, but I still fail to see how this is advantageous over a permanent table. With a permanent table, you can deny permissions; you cannot deny users from a global ##temp table.

Permanent Tables

CREATE TABLE people
(
id INT,
name VARCHAR(32)
)

A permanent table is created in the local database, however you can (unlike #temp tables) choose to create a table in another database, or even on another server, for which you have access. Like global ##temp tables, a permanent table will persist the session in which it is created, unless you also explicitly drop the table. (For contention and concurrency reasons, creating a "temporary" permanent table in this manner doesn't really make a lot of sense.) If you are planning to create a permanent table when the procedure runs, you should check to see if it exists, in order to avoid errors like the one mentioned above. For more information about checking for the existence of both local #temp tables and permanent tables, see Article #2458.

Like global ##temp tables, there seems to be little reason to use a permanent table unless the data is going to persist... and if that is the case, why not create the permanent table before the stored procedure is ever run, thereby eliminating all the CREATE / DROP logic?

Table Variables

DECLARE @people TABLE
(
id INT,
name VARCHAR(32)
)

A table variable is created in memory, and so performs slightly better than #temp tables (also because there is even less locking and logging in a table variable). A table variable might still perform I/O to tempdb (which is where the performance issues of #temp tables make themselves apparent), though the documentation is not very explicit about this.

Table variables are automatically cleared when the procedure or function goes out of scope, so you don't have to remember to drop or clear the data (which can be a good thing or a bad thing; remember "release early"?). The tempdb transaction log is less impacted than with #temp tables; table variable log activity is truncated immediately, while #temp table log activity persists until the log hits a checkpoint, is manually truncated, or when the server restarts.

Table variables are the only way you can use DML statements (INSERT, UPDATE, DELETE) on temporary data within a user-defined function. You can create a table variable within a UDF, and modify the data using one of the above statements. For example, you could do this:

CREATE FUNCTION dbo.example1
(
)
RETURNS INT
AS
BEGIN
DECLARE @t1 TABLE (i INT)
INSERT @t1 VALUES(1)
INSERT @t1 VALUES(2)
UPDATE @t1 SET i = i + 5
DELETE @t1 WHERE i < max =" MAX(i)">

However, try that with a #temp table:

CREATE FUNCTION dbo.example2
(
)
RETURNS INT
AS
BEGIN
CREATE TABLE #t1 (i INT)
INSERT #t1 VALUES(1)
INSERT #t1 VALUES(2)
UPDATE #t1 SET i = i + 5
DELETE #t1 WHERE i < max =" MAX(i)">

Results:

Server: Msg 2772, Level 16, State 1, Procedure example2, Line 7
Cannot access temporary tables from within a function.

Or try accessing a permanent table:

CREATE TABLE table1
(
id INT IDENTITY,
name VARCHAR(32)
)
GO

CREATE FUNCTION dbo.example3
(
)
RETURNS INT
AS
BEGIN
INSERT table1(name) VALUES('aaron')
RETURN SCOPE_IDENTITY()
END
GO

Results:

Server: Msg 443, Level 16, State 2, Procedure example3, Line 8
Invalid use of 'INSERT' within a function.

Table variables can lead to fewer stored procedure recompilations than temporary tables (see KB #243586 and KB #305977), and — since they cannot be rolled back — do not bother with the transaction log.

So, why not use table variables all the time? Well, when something sounds too good to be true, it probably is. Let's visit some of the limitations of table variables (part of this list was derived from KB #305977):

Table variables are only allowed in SQL Server 2000+, with compatibility level set to 80 or higher.
You cannot use a table variable in either of the following situations:

INSERT @table EXEC sp_someProcedure

SELECT * INTO @table FROM someTable
You cannot truncate a table variable.
Table variables cannot be altered after they have been declared.

You cannot explicitly add an index to a table variable, however you can create a system index through a PRIMARY KEY CONSTRAINT, and you can add as many indexes via UNIQUE CONSTRAINTs as you like. What the optimizer does with them is another story. One thing to note is that you cannot explicitly name your constraints, e.g.:

DECLARE @myTable TABLE
(
CPK1 int,
CPK2 int,
CONSTRAINT myPK PRIMARY KEY (CPK1, CPK2)
)

-- yields:
Server: Msg 156, Level 15, State 1, Line 6
Incorrect syntax near the keyword 'CONSTRAINT'.

-- yet the following works:
DECLARE @myTable TABLE
(
CPK1 int,
CPK2 int,
PRIMARY KEY (CPK1, CPK2)
)

You cannot use a user-defined function (UDF) in a CHECK CONSTRAINT, computed column, or DEFAULT CONSTRAINT.
You cannot use a user-defined type (UDT) in a column definition.
Unlike a #temp table, you cannot drop a table variable when it is no longer necessary—you just need to let it go out of scope.

You cannot generate a table variable's column list dynamically, e.g. you can't do this:

SELECT * INTO @tableVariable

-- yields:

Server: Msg 170, Level 15, State 1, Line 1
Line 1: Incorrect syntax near '@tableVariable'.

You also can't build the table variable inside dynamic SQL, and expect to use it outside that scope, e.g.:

DECLARE @colList VARCHAR(8000), @sql VARCHAR(8000)
SET @colList = 'a INT,b INT,c INT'
SET @sql = 'DECLARE @foo TABLE('+@colList+')'
EXEC(@sql)
INSERT @foo SELECT 1,2,3

-- this last line fails:

Server: Msg 137, Level 15, State 2, Line 5
Must declare the variable '@foo'.

This is because the rest of the script knows nothing about the temporary objects created within the dynamic SQL. Like other local variables, table variables declared inside of a dynamic SQL block (EXEC or sp_executeSQL) cannot be referenced from outside, and vice-versa. So you would have to write the whole set of statements to create and operate on the table variable, and perform it with a single call to EXEC or sp_executeSQL.

The system will not generate automatic statistics on table variables. Likewise, you cannot manually create statistics (statistics are used to help the optimizer pick the best possible query plan).
An INSERT into a table variable will not take advantage of parallelism.
A table variable will always have a cardinality of 1, because the table doesn't exist at compile time.

Table variables must be referenced by an alias, except in the FROM clause. Consider the following two scripts:

CREATE TABLE #foo(id INT)
DECLARE @foo TABLE(id INT)
INSERT #foo VALUES(1)
INSERT #foo VALUES(2)
INSERT #foo VALUES(3)
INSERT @foo SELECT * FROM #foo

SELECT id
FROM @foo
INNER JOIN #foo
ON @foo.id = #foo.id

DROP TABLE #foo

The above fails with the following error:

Server: Msg 137, Level 15, State 2, Line 11
Must declare the variable '@foo'.

This query, on the other hand, works fine:

SELECT id
FROM @foo f
INNER JOIN #foo
ON f.id = #foo.id

Table variables are not visible to the calling procedure in the case of nested procs. The following is legal with #temp tables:

CREATE PROCEDURE faq_outer
AS
BEGIN
CREATE TABLE #outer
(
letter CHAR(1)
)

EXEC faq_inner

SELECT letter FROM #outer

DROP TABLE #outer
END
GO

CREATE PROCEDURE faq_inner
AS
BEGIN
INSERT #outer VALUES('a')
END
GO

EXEC faq_outer

Results:

letter
------
a

(1 row(s) affected)

However, you cannot do this with table variables. The parser will find the error before you can even create it:

CREATE PROCEDURE faq_outer
AS
BEGIN
DECLARE @outer TABLE
(
letter CHAR(1)
)

EXEC faq_inner

SELECT letter FROM @outer
END
GO

CREATE PROCEDURE faq_inner
AS
BEGIN
INSERT @outer VALUES('a')
END
GO

Results:

Server: Msg 137, Level 15, State 2, Procedure faq_inner, Line 4
Must declare the variable '@outer'.

For more information about sharing data between stored procedures, please see this article by Erland Sommarskog.

Conclusion

Like many other areas of technology, there is no "right" answer here. For data that is not meant to persist beyond the scope of the procedure, you are typically choosing between #temp tables and table variables. Your ultimate decision should depend on performance and reasonable load testing. As your data size gets larger, and/or the repeated use of the temporary data increases, you will find that the use of #temp tables makes more sense. Depending on your environment, that threshold could be anywhere — however you will obviously need to use #temp tables if any of the above limitations represents a significant roadblock.