SQL Server 2000 Best Practices
Best Practices is a great and wonderful thing … that
is if everyone working on the project adheres to them.
In many of the sessions and usergroups I've been
asked by many for the best or rather should I say better practices of coding
while using SQL Server 2000. The views listed below are mine and I'am sure any
SQL Server Guru might not argue it otherwise. Just thought of adding sections
for coding practices. And here I am ... You can also consider these as
guidelines for development in SQL Server. Hope you get good milage out of
this article ...
Catch the first version of the article at MSDN India site.
Since more practices get added I've hosted this in this site also.
Note:In this
article I'll assume that you already know the T-SQL syntax and we are
working with SQL Server 2000.
1.
Normalize your tables
There are two common excuses for not normalizing
databases: performance and pure laziness. You'll pay for the second one sooner
or later; and, about performance, don't optimize what's not slow. And, more
frequent than the inverse, the resulting design is slower. DBMS’s were designed
to be used with normalized databases and SQL Server is no exception, so design
with normalization in mind.
2.
Avoid using cursors
Use cursors wisely. Cursors are fundamentally evil.
They force the database engine to repeatedly fetch rows, negotiate blocking,
manage locks, and transmit results. They consume network bandwidth as the
results are transmitted back to the client, where they consume RAM, disk space,
and screen real estate. Consider the resources consumed by each cursor you
build and multiply this demand by the number of simultaneous users. Smaller is
better. And good DBAs, most of the time, know what they are doing. But, if you
are reading this, you are not a DBA, right?
Having said this the other question that comes is, If
I were to use cursors then .... ? Well here are my 20Cents on cursor usage. Use
the appropriate cursors for the job in hand.
·
Don't use scrollable cursors unless
required
·
Use readonly cursors if you donot
intend to update. This would be 90% of the situations.
·
Try to use Forward Only cursor when
using cursors
·
Don’t forget to close and
deallocate the cursors used.
·
Try to reduce the number of columns
and records fetched in a cursor
3.
Index Columns
Create Index on columns that are going to be highly
selective. Indexes are vital to efficient data access; however, there is a cost
associated with creating and maintaining an index structure. For every insert,
update and delete, each index must be updated. In a data warehouse, this is
acceptable, but in a transactional database, you should weigh the cost of
maintaining an index on tables that incur heavy changes. The bottom line is to
use effective indexes judiciously. On analytical databases, use as many indexes
as necessary to read the data quickly and efficiently.
Now a classic example is DONOT index an column like
"Gender". This would have a selectivity of 50% and if your table is having 10
Million records, you can be least assured that using this index you may have to
travel half the number of rows ... Hence maintaining such indexes can slow your
performance.
4.
Use transactions
Use transaction judiciously. This will save you
when things get wrong. Working with data for some time you'll soon discover
some unexpected situation which will make your stored procured crash. See that
the transaction starts as late as possible and ends as early as possible. This
would reduce the requirement to lock down the resources while accessing. In
short,
5.
Analyze deadlocks
Access your tables on the same order
always. When working with stored procedures and
transactions, you may find this soon. Any SQL programmer / database analyst
would have come across this problem. If the order changes then there wold be a
cyclic wait for resources to be released and the users would experience a
permanent hang in the application. Deadlocks can be tricky to find if the lock
sequence is not carefully designed. To summarize, Deadlock occurs when two
users have locks on separate objects and each user is trying to lock the other
user's objects. SQL Server automatically detects and breaks the deadlock. The
terminated transaction will be automatically rolled back and an error code 1205
will be issued.
6.
GOTO Usage
Avoid using the infamous GOTO. This is a time-proven
means of adding disorder to program flow. There are some cases where
intelligent use of GOTO is preferable to dogmatically refusing to use it. On
the other hand, unintelligent use of GOTO is a quick ticket to unreadable code.
7.
Increase timeouts
When querying a database, the default timeout is often
low, like 30 seconds. Remember that report queries may run longer than this,
specially when your database grows. Hence increase this value to an acceptable
value.
8.
Avoid NULLable columns
When possible, normalize your table and separate your
nullable columns. They consume an extra byte on each NULLable column in each
row and have more overhead associated when querying data. It will be more
flexible and faster, and will reduce the NULLable columns. I'm not saying that
NULLs are the evil incarnation. I believe they can simplify coding when
"missing data" is part of your business rules.
9.
TEXT datatype
Unless you are using it for really large data. The
TEXT datatype is not flexible to query, is slow and wastes a lot of space if
used incorrectly. Sometimes a VARCHAR will handle your data better. You can
also look at the "text in row" feature with the table options for SQL Server
2000. But still I would stick to the first statement, Avoid using them on first
place.
10.
SELECT * Usage
Its very difficult to get out of this habit, but
believe me this is very essential. Please DONOT use this syntax. Always qualify
the full list of columns. Using all columns increases network traffic, requires
more buffers and processing, and could prove error prone if the table or view
definition changes.
11.
Temporary tables usage
Unless strictly necessary. More often than not a
subquery can substitute a temporary table. In SQL Server 2000, there are
alternatives like the TABLE variable datatype which can provide in-memory
solutions for small tables inside stored procedures too. If I were to recollect
some of the advantages of using the same:
·
A table variable behaves like a
local variable. It has a well-defined scope, which is the function, stored
procedure, or batch in which it is declared. Within its scope, a table variable
may be used like a regular table.
·
However, table may not be used in
the following statements: INSERT INTO table_variable EXEC stored_procedure
SELECT select_list INTO table_variable statements.
·
Table variables are cleaned up
automatically at the end of the function, stored procedure, or batch in which
they are defined.
·
Table variables used in stored
procedures result in fewer recompilations of the stored procedures than their
counterparts temporary tables.
·
Transactions involving table
variables last only for the duration of an update on the table variable. Thus,
table variables require less locking and logging resources
12.
Using UDF
UDF can replace stored procedures. But be careful in
their usage. Sometimes UDFs can take a toll on your applications performance.
And UDFs have to prefixed with the owners name. This is not a drawback but a
requirement. I support usage of SPs more than UDFs.
13.
Multiple User Scenario
Sometimes two users will edit the same record at the
same time. While writing back, the last writer wins and some of the updates
will be lost. It's easy to detect this situation: create a timestamp column and
check it before you write. Code for these practical situations and test your
application for these scenarios.
14.
Use SCOPE_IDENTITY
Dont do SELECT max(ID) from MasterTable when inserting
in a Details table. This is a common mistake, and will fail when concurrent
users are inserting data at the same instance. Use one of SCOPE_IDENTITY or
IDENT_CURRENT. My choice would be SCOPE_IDENTITY as this would give you the
identity value from the current context in prespective.
15.
Analyze Query Plans
The SQL Server query analyzer is a powerful tool. And
surely is your friend, and you'll learn a lot of how it works and how the query
and index design can affect performance through it. Understand the execution
plan that the execution plan window shows for potential bottlenecks.
16.
Parameterized queries
Parameterize all your queries using the sp_executesql.
This would help the optimzer to chace the execution plans and use the same when
requested teh second time. You can cache-in the time required to parse, compile
and place the execution plan. Avoid using of D-SQL as much as possible.
17.
Keep Procedures Small
Keep SPs small in size and scope. Two users invoking
the same stored procedure simultaneously will cause the procedure to create two
query plans in cache. It is much more efficient to have a stored procedure call
other ones then to have one large procedure.
18.
Bulk INSERT
Use DTS or the BCP utility and you'll have both a
flexible and fast solution. Try avoiding use of Insert statement for the Buld
loading feature, they are not efficent and are not designed for the same.
19.
Using JOINS
Make sure that there are n-1 join criteria if there
are n tables.
Make sure that ALL tables included in the statement
are joined. Make sure that only tables that
·
Have columns in the select clause
·
Have columns referenced in the
where clause
·
Allow two unrelated tables to be
joined together are included.
20.
Trap Errors
Make sure that the @@ERROR global variable is checked
after every statement which causes an update to the database (INSERT, UPDATE,
DELETE). Make sure that rollbacks (if appropriate) are performed prior to
inserting rows into an exception table
21.
Small Result Set
Retrieving needlessly large result sets (for example,
thousands of rows) for browsing on the client adds CPU and network I/O load,
makes the application less capable of remote use, and limits multi-user
scalability. It is better to design the application to prompt the user for
sufficient input so queries are submitted that generates modest result sets.
22.
Negative Arguments
Minimize the use of not equal operations, <> or
!=. SQL Server has to scan a table or index to find all values to see if they
are not equal to the value given in the expression. Try rephrasing the
expression using ranges:
WHERE
KeyColumn < 'TestValue' AND KeyColumn > 'TestValue'
23.
Date Assumption
Prevent issues with the interpretation of centuries in
dates, do not specify years using two digits. Assuming dates formats is the
first place to break an application. Hence avoid making this assumption.
24.
SP_ Name
DONOT start the name of a stored procedure with SP_.
This is because all the system related stored procedures follow this
convention. Hence a valid procedure today may clash with the naming convention
of a system procedure that gets bundled with a Service pack / Security patch
tomorrow. Hence do not follow this convention.
25.
Apply the latest Security Packs / Service
Packs
Even though this point applies to the network and the
database administrators, it is always better to keep up-to date on the
software’s. With the "slammer" virus and many more still outside, it is one of
the best practices to be up-to date on the same. Consider this strongly.
26.
Using Count(*)
The only 100 percent accurate way to check the number
of rows in a table is to use a COUNT(*) operation. The statement might consume
significant resources if your tables are very big because scanning a large
table or index can consume a lot of I/O. Avoid these type of queries to the
maximum. Use short circuting methods as EXISTS etc. Here is one other way you
can find the total number of rows in a table. SQL Server Books Online (BOL)
documents the structure of sysindexes; the value of sysindexes.indid will
always be 0 for a table and 1 for a clustered index. If a table doesn't have a
clustered index, its entry in sysindexes will always have an indid value of 0.
If a table does have a clustered index, its entry in sysindexes will always
have an indid value of 1.
SELECT
object_name(id) ,rowcnt
FROM sysindexes
WHERE indid
IN (1,0) AND OBJECTPROPERTY(id, 'IsUserTable') = 1
27.
Ownership Chaining
Try using this feature (available from SQL Server 2000
SP3), for permission management within a single database. Avoid using this
feature to manage permissions across database.
28.
SQL Injection
Security has been a prime concern for everyone. Hence
validate all the incoming parameters at all levels of the application. Limit
the scope of possible damage by permitting only minimally privileged accounts
to send user input to the server. Adding to it, run SQL Server itself with the
least necessary privileges.
29.
Fill-factor
The 'fill factor' option specifies how full SQL Server
will make each index page. When there is no free space to insert new row on the
index page, SQL Server will create new index page and transfer some rows from
the previous page to the new one. This operation is called page splits. You can
reduce the number of page splits by setting the appropriate fill factor option
to reserve free space on each index page. The fill factor is a value from 1
through 100 that specifies the percentage of the index page to be left empty.
The default value for fill factor is 0. It is treated similarly to a fill
factor value of 100, the difference in that SQL Server leaves some space within
the upper level of the index tree for FILLFACTOR = 0. The fill factor
percentage is used only at the time the index is created. If the table contains
read-only data (or data that very rarely changed), you can set the 'fill
factor' option to 100. When the table's data modified very often, you can
decrease the 'fill factor' option to 70 percent, for example. Having explained
page splits in detail I would warn you in over looking at this point because
more free space means that SQL Server has to traverse through more pages to get
the same amount of data. Hence try to strike a balance and arrive at an
appropriate value.
30.
Start-up Procedures
Verify all the stored procedures for safety reasons.
31.
Analyze Blocking
More often than not any implementers nightmare would
be to see a blocking process. Blocking occurs when a process must wait for
another process to complete. The process must wait because the resources it
needs are exclusively used by another process. A blocked process will resume
operation after the resources are released by the other process. Sometimes this
can become cyclic and the system comes to a stand still. The only solution is
to analyze your indexing strategy and table design. Consider these points
strongly.
32.
Avoid Un-necessary Indexes
Avoid creating un-necessary indexes on table thinking
they would improve your performance. Understand that creating Indexes and
maintaining them are overheads that you incur. And these surely do reduce the
throughput for the whole application. You can create a simple test on a large
table and find it for yourself how multiple indexes on the same column decrease
performance.
33.
Consider Indexed Views
Sometimes we would require an view to be indexed. This
feature is bundled with SQL Server 2000. The result set of the indexed view is
persist in the database and indexed for fast access. Because indexed views
depend on base tables, you should create indexed views with SCHEMABINDING
option to prevent the table or column modification that would invalidate the
view. Hence using them can reduce a lot of load on the base tables but
increases the maintainability.
34.
WITH SORT_IN_TEMPDB Option
Consider using this option when you create an index
and when tempdb is on a different set of disks than the user database. This is
more of a tuning recommendation. Using this option can reduce the time it takes
to create an index, but increases the amount of disk space used to create an
index. Time is precious, disk is cheaper.
35.
Reduce Number of Columns
Try to reduce the number of columns in a table. The fewer the number of columns
in a table, the less space the table will use, since more rows will fit on a
single data page, and less I/O overhead will be required to access the table's
data. This should be considered strongly by applications that talk across
different machines. More the unwanted data passed more is the network latency
observed.
36. To be added
....
|