Database storage is one of the primary failure point in the database
environment. Hence, making the storage architecture consistent and at the same
time giving it maximum redundancy must be embedded within the design itself.
When I call redundancy, I don’t specifically say implementing anything specific
to RAID or SAN or other mechanisms. Redundancy must be achieved given your
hardware and budget currently in hand – if it even means making just multiple
backup copies at the simplest level.
When
we talk about storage, the one thing that gets missed out is the need to plan
for your system database like TempDB. More often than not we see DBA’s using
the default configuration without understanding the consequences. Typically,
tempDB handles large number of operations like row-versioning, online index
operation, generating worktables while sort and other query operation, temp
table creation etc. Hence, damage to this critical database will bring our
whole instance down. Hence understanding the storage strategies from a
high-performance and high-fault tolerance angle is very critical. This article
will try to address some of these concerns if not all.
Storage
requirement for Databases
To
start with, in SSMS, we see a separate note as System Databases. Now there are
databases created as part of adding additional database installation and
configurations. The standard system databases include: master, model, msdb,
tempdb and resource. Then we have the distribution (with replication),
ReportServer, ReportServerTempDB (with RS installation) databases that get
added based on the additional configuration we make to the system.
Although
system databases are critical to the operation of SQL Server, they tend to be
very small databases. Therefore, there is no need to determine their precise
storage requirements. You SHOULD NOT configure the default database sizes and
options – Change them based on your environment. Typically, larger enterprise
environments have a set of standards for the location of system databases for
all SQL Server installations. This set of standards provides a number of
benefits, including the ability to run administrative scripts consistently
across multiple installations.
Typically,
the performance of system databases is not an issue because these databases are
small and their active portion resides in memory. Reliability is the most
important consideration for system databases because a catastrophic failure of
these databases can bring our whole server instance down. All system databases
are critical to the operation of SQL Server, not all of them will cause a
catastrophic failure of your SQL Server in the event of corruption or other
failures. MASTER, you are guaranteed to have a server downtime if this database
goes corrupt. Should ensure that it resides on a redundant drive array and you
should back up the master database regularly to ensure minimal data loss.
Moving to MODEL database, though SQL Server runs without much problem, creating
of new databases will not be possible if this database is corrupt. MSDB,
another interesting database which doesn’t bring the server down in event of
failure, but there are a lot of dependant services to this like the SSIS
packages and replication via the SQL Server Agent service. These will seize to
function properly.
Though
we are talking about the various system databases and High-availability
solutions for them a design typically involves a RAID or a storage area network
(SAN) solution. However, you should avoid RAID 0 because it does not provide
redundancy. RAID 1 is sufficient for the system databases and keeping them
separate would help.
TempDB
World!!!
TempDB
as the name suggests is for temporary storage and is often overlooked aspect of
configuration. As we quoted before this database will come to use when we have
hash joins, cursors, table variables, temp tables, snapshot isolation, row
versioning etc. Hence it is important for us to decide the space, location and
performance of this database more than any other system database.
Now,
there is no rule-of-thumb where we can give a one size fits all. This very much
depends on whether you are using operations such as temporary tables, table
variables, cursors, row versioning, Multiple Active Result Sets (MARS), and the
SORT_IN_TEMPDB index option. Initially, you need to keep track on the growth of
the tempDB database during peak hours and it is best to pre-allocate enough
space for the tempdb database to prevent it from automatically growing too
frequently and adversely affecting performance.
We
can use the sys.dm_db_file_space_usage dynamic management view to monitor the
disk space usage of the tempdb database files. We can also use the
sys.dm_db_session_space_usage and sys.dm_db_task_space_usage dynamic management
views to identify queries and structures that use a large amount of disk space.
From
size to location, this is purely dependant on the budget, hardware and the
database activity. TempDB is a write-intensive database and even when we are
considering RAID scenarios we need to be careful. Like RAID 5 is particularly
not suited for write performance and RAID 0 is bad because of its no redundancy
option. Contrarily, if we are using a SAN solution, we must decide whether to
store the tempdb database files on the server’s local drives or on the SAN
solution. Heavy utilization of the tempdb database will generate a lot of
network traffic; therefore, we should store the database files locally.
However, there might be local storage limitations that might force us
otherwise.
Some
of the general high-level recommendations include: storing tempDB on its own
dedicated solid-state drives, separating data and log files, creating multiple
data files to take advantage of the I/O and multi-CPU machines.
Though
with TempDB, we have just touched the surface, there is an interesting
whitepaper on the TechNet site: http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/tempdb_capacity_planning_index.mspx
Storage
requirement for User Databases
Configuring
the best possible options for the user databases is vital for performance,
scalable and highly-available systems. But before even we get started it is
critical to understand the difference between the various disk drive
technologies, such as enhanced integrated drive electronics (EIDE), Serial
Advanced Technology Attachment (SATA), and small computer system interface
(SCSI), Storage Area Network (SAN) and select the technology that is best
suited to your requirement.
Another
important consideration is the size of the transaction log. We must ensure that
the log does not run out of free space because that will have an adverse effect
on availability of the database. The size of the transaction log is affected by
the type and volume of data modifications in our database. In general, the size
of user databases depends on factors such as the structure of the tables and
the number of rows in them, the number of data rows that can physically fit on
a SQL Server page, the number of indexes, the fill factor used by indexes, and
whether database snapshots are being taken etc.
It
is often very difficult to change our storage design without incurring
substantial downtime once we get them locked and system is in production.
Therefore, we should carefully plan whether you are going to take advantage of
database engine features, such as filegroups, partitions, the number of
database files, and various database options. Therefore, you should take
advantage of instant database file initialization, if possible, to improve the
performance of auto-grow operations.
Note:
Instant database file initialization reclaims the used disk space without
filling that space with zeros; instead, the disk content is overwritten as new
data is written to the files. As a result, instant database file initialization
allows fast execution of file operations, such as creating a database file and
increasing the size of an existing file. And this is available with Windows XP
or Windows Server 2003 or later versions.
Other
Storage requirements
The
SQL Server 2005 setup process installs components in a separate directory that
is based on the next available instance ID like MSSQL.1 and so-on. Although the
physical location may vary depending on your configuration, we should install
the various components using the default location on the system drive as it
becomes easy from an administration point of view. The HDD requirement depends
on our configuration we selected during install. The various components take
approx space like: Database engine with data files and FTS (~130 MB), Analysis
Services and data files (~35MB), Reporting server and Reporting Manager
(~10MB), Integration Services (~10MB) and so-on.
Hence
when we say installing on the default location, it must be taken into
consideration when we talk about the various HDD requirements. What we have not
mentioned is the pre-requisite installations like .NET Framework 2.0 and other
pre-req files. SQL Server 2005 includes several full-text indexing enhancements
that are mainly related to performance and integration with databases.
Full-text indexing now supports multiple instances, and the database engine has
been optimized for faster indexing and querying. Full-text catalogs are
natively part of backup files and detached databases. Additional new features
include thesaurus support, XML data type support, and accent
sensitivity/insensitivity.
The
full-text indexing process is I/O intensive because it consists of reading data
from SQL Server databases and then writing to the index in the file system.
Therefore, it is a good practice to create full-text catalogs on their own
physical drive arrays. Although full-text indexes are backed up as part of a
database backup, you should provide an appropriate level of disk drive
redundancy to ensure availability and to minimize the impact on your
high-availability solution in the event of hardware failure.
Trace
and event log files are critical for monitoring and identifying trends in
performance, and troubleshooting potential failures. Therefore, you should
consider their requirements when designing your storage solution. SQL Server
Profiler uses space from the operating system temporary directory. This default
setting is usually sufficient. If you are running a trace for a long period of
time, you should ensure that you have sufficient space on the operating system
drive. To avoid problems, you can place the temporary directory on a drive that
is not a system drive. To do this, you need to change the value for the TEMP
environment variable. SQL Server Profiler requires at least 10 MB of free
space. If the free space falls below 10 MB while you are using SQL Server
Profiler, all SQL Server Profiler functions will stop.
Other
log file includes the Windows Error Logs, SQL Server Agent Logs, Database Mail
Logs, SAL Server Engine Error Logs etc. The size of the default Windows Event
Log files is 512 KB, with the files being overwritten as required. I have seen
this been changed to 2-3MB based on the requirement of retention period for
these log data.
Next
important component that needs attention from the storage engine is the backup
operation. You need to determine the correct storage requirement for your
backup files because it directly affects the speed of your backup and restore
operations, and might impact the performance of online operations. Most
enterprises back up data to a disk before archiving it to a tape device. This
configuration has a minimal impact on the performance of your high-availability
solution and it ensures that backups are completed more quickly. You can also
back up databases to network shares or mapped drives, but you must evaluate the
impact that this will have on your network and its reliability. You can have a
separate network segment for such network traffic.
To
calculate the space requirement, you must determine the number and size of the
databases that are part of your backup strategy. You must also plan for future
growth so that your backup storage location does not run out of free space. The
amount of space required for your backups will also be affected by factors such
as your database backup strategy, backup rotation, and database mode. You
should not back up databases to compressed NTFS volumes. It is a common
practice to keep the latest database backups on the high-availability solution
so that you can recover your system in a minimal amount of time. Your DRP
(Disaster Recovery Process) will outline the types of backup strategy you have
and how to execute in event of a failure or natural calamity.
Ideally,
your backup storage should be on a separate disk array, on a channel that is
separate from your database files. Backup performance can also be improved by
backing up to multiple disk drives, an appropriate RAID array, or even multiple
backup devices on a single drive. With SQL Server 2005, we allocate a reader
thread to each disk device that is used to store the database. A writer thread
is allocated to each backup device. SQL Server 2005 can back up to 64 backup
devices. This fundamentally improves the speed of the backup operation as we
are doing the operations in parallel. The backup topic in SQL Server is quite
interesting and will get into a separate article later.
Conclusion
In
this article we just took a sneak preview to what and where all one needs to
channelize energy when working with the storage system. Infact, we have just
got started on this. I will follow this with a Part II where we will get into
some of the more specifics of this topic.
Here
are some interesting links for your reference:
INF:
How to Move Tempdb to a Different Device -
http://support.microsoft.com/kb/187824/
BOL:
Moving System Databases -
http://msdn2.microsoft.com/en-us/library/ms345408.aspx
BOL:
Database File Initialization -
http://msdn2.microsoft.com/en-us/library/ms175935.aspx
Webcast:
Understanding IO and Storage system with SQL Server 2005