Windows Memory Architecture

Windows Memory Architecture Software
Windows Memory Architecture Free

Querying for whether Unified Memory Architecture (UMA) is supported can help determine how to handle some resources. A boolean, set by the driver, can be read from the D3D11FEATUREDATAD3D11OPTIONS2 structure to determine if the hardware supports UMA. Jan 04, 2010 The Windows memory architecture – part 1: the basics. Posted on Updated on. Most IT professionals dealing with Windows in a decent way have already heard of terms like “virtual memory”, “physical memory”, “working set”, “reserving memory”, “committed memory”, “page file”, “swapping”, “regions”, etc.

-->

APPLIES TO: SQL Server Azure SQL Database Azure SQL Data Warehouse Parallel Data Warehouse

Windows Virtual Memory Manager

The committed regions of address space are mapped to the available physical memory by the Windows Virtual Memory Manager (VMM).

For more information on the amount of physical memory supported by different operating systems, see the Windows documentation on Memory Limits for Windows Releases.

Virtual memory systems allow the over-commitment of physical memory, so that the ratio of virtual-to-physical memory can exceed 1:1. As a result, larger programs can run on computers with a variety of physical memory configurations. However, using significantly more virtual memory than the combined average working sets of all the processes can cause poor performance.

SQL Server Memory Architecture

SQL Server dynamically acquires and frees memory as required. Typically, an administrator does not have to specify how much memory should be allocated to SQL Server, although the option still exists and is required in some environments.

One of the primary design goals of all database software is to minimize disk I/O because disk reads and writes are among the most resource-intensive operations. SQL Server builds a buffer pool in memory to hold pages read from the database. Much of the code in SQL Server is dedicated to minimizing the number of physical reads and writes between the disk and the buffer pool. SQL Server tries to reach a balance between two goals:

Keep the buffer pool from becoming so big that the entire system is low on memory.
Minimize physical I/O to the database files by maximizing the size of the buffer pool.

Note

In a heavily loaded system, some large queries that require a large amount of memory to run cannot get the minimum amount of requested memory and receive a time-out error while waiting for memory resources. To resolve this, increase the query wait Option. For a parallel query, consider reducing the max degree of parallelism Option.

Note

In a heavily loaded system under memory pressure, queries with merge join, sort and bitmap in the query plan can drop the bitmap when the queries do not get the minimum required memory for the bitmap. This can affect the query performance and if the sorting process can not fit in memory, it can increase the usage of worktables in tempdb database, causing tempdb to grow. To resolve this problem add physical memory or tune the queries to use a different and faster query plan.

Providing the maximum amount of memory to SQL Server

By using AWE and the Locked Pages in Memory privilege, you can provide the following amounts of memory to the SQL Server Database Engine.

Note

The following table includes a column for 32-bit versions, which are no longer available.

32-bit ¹	64-bit
Conventional memory	All SQL Server editions. Up to process virtual address space limit: - 2 GB - 3 GB with /3gb boot parameter ² - 4 GB on WOW64 ³	All SQL Server editions. Up to process virtual address space limit: - 7 TB with IA64 architecture (IA64 not supported in SQL Server 2012 (11.x) and above) - Operating system maximum with x64 architecture ⁴
AWE mechanism (Allows SQL Server to go beyond the process virtual address space limit on 32-bit platform.)	SQL Server Standard, Enterprise, and Developer editions: Buffer pool is capable of accessing up to 64 GB of memory.	Not applicable ⁵
Lock pages in memory operating system (OS) privilege (allows locking physical memory, preventing OS paging of the locked memory.) ⁶	SQL Server Standard, Enterprise, and Developer editions: Required for SQL Server process to use AWE mechanism. Memory allocated through AWE mechanism cannot be paged out. Granting this privilege without enabling AWE has no effect on the server.	Only used when necessary, namely if there are signs that sqlservr process is being paged out. In this case, error 17890 will be reported in the Errorlog, resembling the following example: `A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: #### seconds. Working set (KB): ####, committed (KB): ####, memory utilization: ##%.`

¹ 32-bit versions are not available starting with SQL Server 2014 (12.x).
² /3gb is an operating system boot parameter. For more information, visit the MSDN Library.
³ WOW64 (Windows on Windows 64) is a mode in which 32-bit SQL Server runs on a 64-bit operating system.
⁴ SQL Server Standard Edition supports up to 128 GB. SQL Server Enterprise Edition supports the operating system maximum.
⁵ Note that the sp_configure awe enabled option was present on 64-bit SQL Server, but it is ignored.
⁶ If lock pages in memory privilege (LPIM) is granted (either on 32-bit for AWE support or on 64-bit by itself), we recommend also setting max server memory. For more information on LPIM, refer to Server Memory Server Configuration Options

Note

Older versions of SQL Server could run on a 32-bit operating system. Accessing more than 4 gigabytes (GB) of memory on a 32-bit operating system required Address Windowing Extensions (AWE) to manage the memory. This is not necessary when SQL Server is running on 64-bit operation systems. For more information about AWE, see Process Address Space and Managing Memory for Large Databases in the SQL Server 2008 documentation.

Changes to Memory Management starting with SQL Server 2012 (11.x)

In earlier versions of SQL Server ( SQL Server 2005 (9.x), SQL Server 2008 and SQL Server 2008 R2), memory allocation was done using five different mechanisms:

Single-Page Allocator (SPA), including only memory allocations that were less than, or equal to 8-KB in the SQL Server process. The max server memory (MB) and min server memory (MB) configuration options determined the limits of physical memory that the SPA consumed. THe buffer pool was simultaneously the mechanism for SPA, and the largest consumer of single-page allocations.
Multi-Page Allocator (MPA), for memory allocations that request more than 8-KB.
CLR Allocator, including the SQL CLR heaps and its global allocations that are created during CLR initialization.
Memory allocations for thread stacks in the SQL Server process.
Direct Windows allocations (DWA), for memory allocation requests made directly to Windows. These include Windows heap usage and direct virtual allocations made by modules that are loaded into the SQL Server process. Examples of such memory allocation requests include allocations from extended stored procedure DLLs, objects that are created by using Automation procedures (sp_OA calls), and allocations from linked server providers.

Starting with SQL Server 2012 (11.x), Single-Page allocations, Multi-Page allocations and CLR allocations are all consolidated into a 'Any size' Page Allocator, and it's included in memory limits that are controlled by max server memory (MB) and min server memory (MB) configuration options. This change provided a more accurate sizing ability for all memory requirements that go through the SQL Server memory manager.

Important

Carefully review your current max server memory (MB) and min server memory (MB) configurations after you upgrade to SQL Server 2012 (11.x) through SQL Server 2017. This is because starting in SQL Server 2012 (11.x), such configurations now include and account for more memory allocations compared to earlier versions.These changes apply to both 32-bit and 64-bit versions of SQL Server 2012 (11.x) and SQL Server 2014 (12.x), and 64-bit versions of SQL Server 2016 (13.x) through SQL Server 2017.

The following table indicates whether a specific type of memory allocation is controlled by the max server memory (MB) and min server memory (MB) configuration options:

Type of memory allocation	SQL Server 2005 (9.x), SQL Server 2008 and SQL Server 2008 R2	Starting with SQL Server 2012 (11.x)
Single-page allocations	Yes	Yes, consolidated into 'any size' page allocations
Multi-page allocations	No	Yes, consolidated into 'any size' page allocations
CLR allocations	No	Yes
Thread stacks memory	No	No
Direct allocations from Windows	No	No

Starting with SQL Server 2012 (11.x), SQL Server might allocate more memory than the value specified in the max server memory setting. This behavior may occur when the Total Server Memory (KB) value has already reached the Target Server Memory (KB) setting (as specified by max server memory). If there is insufficient contiguous free memory to meet the demand of multi-page memory requests (more than 8 KB) because of memory fragmentation, SQL Server can perform over-commitment instead of rejecting the memory request.

As soon as this allocation is performed, the Resource Monitor background task starts to signal all memory consumers to release the allocated memory, and tries to bring the Total Server Memory (KB) value below the Target Server Memory (KB) specification. Therefore, SQL Server memory usage could briefly exceed the max server memory setting. In this situation, the Total Server Memory (KB) performance counter reading will exceed the max server memory and Target Server Memory (KB) settings.

This behavior is typically observed during the following operations:

Large Columnstore index queries.
Columnstore index (re)builds, which use large volumes of memory to perform Hash and Sort operations.
Backup operations that require large memory buffers.
Tracing operations that have to store large input parameters.

Changes to 'memory_to_reserve' starting with SQL Server 2012 (11.x)

In earlier versions of SQL Server ( SQL Server 2005 (9.x), SQL Server 2008 and SQL Server 2008 R2), the SQL Server memory manager set aside a part of the process virtual address space (VAS) for use by the Multi-Page Allocator (MPA), CLR Allocator, memory allocations for thread stacks in the SQL Server process, and Direct Windows allocations (DWA). This part of the virtual address space is also known as 'Mem-To-Leave' or 'non-Buffer Pool' region.

Best comic book database software. The virtual address space that is reserved for these allocations is determined by the memory_to_reserve configuration option. The default value that SQL Server uses is 256 MB. To override the default value, use the SQL Server -g startup parameter. Refer to the documentation page on Database Engine Service Startup Options for information on the -g startup parameter.

Because starting with SQL Server 2012 (11.x), the new 'any size' page allocator also handles allocations greater than 8 KB, the memory_to_reserve value does not include the multi-page allocations. Except for this change, everything else remains the same with this configuration option.

The following table indicates whether a specific type of memory allocation falls into the memory_to_reserve region of the virtual address space for the SQL Server process:

Type of memory allocation	SQL Server 2005 (9.x), SQL Server 2008 and SQL Server 2008 R2	Starting with SQL Server 2012 (11.x)
Single-page allocations	No	No, consolidated into 'any size' page allocations
Multi-page allocations	Yes	No, consolidated into 'any size' page allocations
CLR allocations	Yes	Yes
Thread stacks memory	Yes	Yes
Direct allocations from Windows	Yes	Yes

Dynamic Memory Management

The default memory management behavior of the SQL Server Database Engine is to acquire as much memory as it needs without creating a memory shortage on the system. The SQL Server Database Engine does this by using the Memory Notification APIs in Microsoft Windows.

When SQL Server is using memory dynamically, it queries the system periodically to determine the amount of free memory. Maintaining this free memory prevents the operating system (OS) from paging. If less memory is free, SQL Server releases memory to the OS. If more memory is free, SQL Server may allocate more memory. SQL Server adds memory only when its workload requires more memory; a server at rest does not increase the size of its virtual address space.

Max server memory controls the SQL Server memory allocation, compile memory, all caches (including the buffer pool), query execution memory grants, lock manager memory, and CLR¹ memory (essentially any memory clerk found in sys.dm_os_memory_clerks).

¹ CLR memory is managed under max_server_memory allocations starting with SQL Server 2012 (11.x).

The following query returns information about currently allocated memory:

Memory for thread stacks¹, CLR², extended procedure .dll files, the OLE DB providers referenced by distributed queries, automation objects referenced in Transact-SQL statements, and any memory allocated by a non SQL Server DLL are not controlled by max server memory.

¹ Refer to the documentation page on how to Configure the max worker threads Server Configuration Option, for information on the calculated default worker threads for a given number of affinitized CPUs in the current host. SQL Server stack sizes are as follows:

SQL Server Architecture	OS Architecture	Stack Size
x86 (32-bit)	x86 (32-bit)	512 KB
x86 (32-bit)	x64 (64-bit)	768 KB
x64 (64-bit)	x64 (64-bit)	2048 KB
IA64 (Itanium)	IA64 (Itanium)	4096 KB

² CLR memory is managed under max_server_memory allocations starting with SQL Server 2012 (11.x).

SQL Server uses the memory notification API QueryMemoryResourceNotification to determine when the SQL Server Memory Manager may allocate memory and release memory.

When SQL Server starts, it computes the size of virtual address space for the buffer pool based on a number of parameters such as amount of physical memory on the system, number of server threads and various startup parameters. SQL Server reserves the computed amount of its process virtual address space for the buffer pool, but it acquires (commits) only the required amount of physical memory for the current load.

The instance then continues to acquire memory as needed to support the workload. As more users connect and run queries, SQL Server acquires the additional physical memory on demand. A SQL Server instance continues to acquire physical memory until it either reaches its max server memory allocation target or the OS indicates there is no longer an excess of free memory; it frees memory when it has more than the min server memory setting, and the OS indicates that there is a shortage of free memory.

As other applications are started on a computer running an instance of SQL Server, they consume memory and the amount of free physical memory drops below the SQL Server target. The instance of SQL Server adjusts its memory consumption. If another application is stopped and more memory becomes available, the instance of SQL Server increases the size of its memory allocation. SQL Server can free and acquire several megabytes of memory each second, allowing it to quickly adjust to memory allocation changes.

Effects of min and max server memory

The min server memory and max server memory configuration options establish upper and lower limits to the amount of memory used by the buffer pool and other caches of the SQL Server Database Engine. The buffer pool does not immediately acquire the amount of memory specified in min server memory. The buffer pool starts with only the memory required to initialize. As the SQL Server Database Engine workload increases, it keeps acquiring the memory required to support the workload. The buffer pool does not free any of the acquired memory until it reaches the amount specified in min server memory. Once min server memory is reached, the buffer pool then uses the standard algorithm to acquire and free memory as needed. The only difference is that the buffer pool never drops its memory allocation below the level specified in min server memory, and never acquires more memory than the level specified in max server memory.

Note

SQL Server as a process acquires more memory than specified by max server memory option. Both internal and external components can allocate memory outside of the buffer pool, which consumes additional memory, but the memory allocated to the buffer pool usually still represents the largest portion of memory consumed by SQL Server.

The amount of memory acquired by the SQL Server Database Engine is entirely dependent on the workload placed on the instance. A SQL Server instance that is not processing many requests may never reach min server memory.

If the same value is specified for both min server memory and max server memory, then once the memory allocated to the SQL Server Database Engine reaches that value, the SQL Server Database Engine stops dynamically freeing and acquiring memory for the buffer pool.

If an instance of SQL Server is running on a computer where other applications are frequently stopped or started, the allocation and deallocation of memory by the instance of SQL Server may slow the startup times of other applications. Also, if SQL Server is one of several server applications running on a single computer, the system administrators may need to control the amount of memory allocated to SQL Server. In these cases, you can use the min server memory and max server memory options to control how much memory SQL Server can use. The min server memory and max server memory options are specified in megabytes. For more information, see Server Memory Configuration Options.

Memory used by SQL Server objects specifications

The following list describes the approximate amount of memory used by different objects in SQL Server. The amounts listed are estimates and can vary depending on the environment and how objects are created:

Lock (as maintained by the Lock Manager): 64 bytes + 32 bytes per owner
User connection: Approximately (3 * network_packet_size + 94 kb)

The network packet size is the size of the tabular data scheme (TDS) packets that are used to communicate between applications and the SQL Server Database Engine. The default packet size is 4 KB, and is controlled by the network packet size configuration option.

When multiple active result sets (MARS) are enabled, the user connection is approximately (3 + 3 * num_logical_connections) * network_packet_size + 94 KB

Effects of min memory per query

The min memory per query configuration option establishes the minimum amount of memory (in kilobytes) that will be allocated for the execution of a query. This is also known as the minimum memory grant. All queries must wait until the minimum memory requested can be secured, before execution can start, or until the value specified in the query wait server configuration option is exceeded. The wait type that is accumulated in this scenario is RESOURCE_SEMAPHORE.

Important

Do not set the min memory per query server configuration option too high, especially on very busy systems, because doing so could lead to:

Increased competition for memory resources.
Decreased concurrency by increasing the amount of memory for every single query, even if the required memory at runtime is lower that this configuration.

For recommendations on using this configuration, see Configure the min memory per query Server Configuration Option.

Memory grant considerations

For row mode execution, the initial memory grant cannot be exceeded under any condition. If more memory than the initial grant is needed to execute hash or sort operations, then these will spill to disk. A hash operation that spills is supported by a Workfile in TempDB, while a sort operation that spills is supported by a Worktable.

A spill that occurs during a Sort operation is known as a Sort Warning. Sort warnings indicate that sort operations do not fit into memory. This does not include sort operations involving the creation of indexes, only sort operations within a query (such as an ORDER BY clause used in a SELECT statement).

A spill that occurs during a hash operation is known as a Hash Warning. These occur when a hash recursion or cessation of hashing (hash bailout) has occurred during a hashing operation.

Hash recursion occurs when the build input does not fit into available memory, resulting in the split of input into multiple partitions that are processed separately. If any of these partitions still do not fit into available memory, it is split into sub-partitions, which are also processed separately. This splitting process continues until each partition fits into available memory or until the maximum recursion level is reached.
Hash bailout occurs when a hashing operation reaches its maximum recursion level and shifts to an alternate plan to process the remaining partitioned data. These events can cause reduced performance in your server.

For batch mode execution, the initial memory grant can dynamically increase up to a certain internal threshold by default. This dynamic memory grant mechanism is designed to allow memory resident execution of hash or sort operations running in batch mode. If these operations still do not fit into memory, then these will spill to disk.

For more information on execution modes, see the Query Processing Architecture Guide.

Buffer management

The primary purpose of a SQL Server database is to store and retrieve data, so intensive disk I/O is a core characteristic of the Database Engine. And because disk I/O operations can consume many resources and take a relatively long time to finish, SQL Server focuses on making I/O highly efficient. Buffer management is a key component in achieving this efficiency. The buffer management component consists of two mechanisms: the buffer manager to access and update database pages, and the buffer cache (also called the buffer pool), to reduce database file I/O.

How buffer management works

A buffer is an 8 KB page in memory, the same size as a data or index page. Thus, the buffer cache is divided into 8 KB pages. The buffer manager manages the functions for reading data or index pages from the database disk files into the buffer cache and writing modified pages back to disk. A page remains in the buffer cache until the buffer manager needs the buffer area to read in more data. Data is written back to disk only if it is modified. Data in the buffer cache can be modified multiple times before being written back to disk. For more information, see Reading Pages and Writing Pages.

When SQL Server starts, it computes the size of virtual address space for the buffer cache based on a number of parameters such as the amount of physical memory on the system, the configured number of maximum server threads, and various startup parameters. SQL Server reserves this computed amount of its process virtual address space (called the memory target) for the buffer cache, but it acquires (commits) only the required amount of physical memory for the current load. You can query the bpool_commit_target and bpool_committed columns in the sys.dm_os_sys_info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively.

The interval between SQL Server startup and when the buffer cache obtains its memory target is called ramp-up. During this time, read requests fill the buffers as needed. For example, a single 8 KB page read request fills a single buffer page. This means the ramp-up depends on the number and type of client requests. Ramp-up is expedited by transforming single page read requests into aligned eight page requests (making up one extent). This allows the ramp-up to finish much faster, especially on machines with a lot of memory. For more information about pages and extents, refer to Pages and Extents Architecture Guide.

Because the buffer manager uses most of the memory in the SQL Server process, it cooperates with the memory manager to allow other components to use its buffers. The buffer manager interacts primarily with the following components:

Resource manager to control overall memory usage and, in 32-bit platforms, to control address space usage.
Database manager and the SQL Server Operating System (SQLOS) for low-level file I/O operations.
Log manager for write-ahead logging.

Supported Features

The buffer manager supports the following features:

The buffer manager is non-uniform memory access (NUMA) aware. Buffer cache pages are distributed across hardware NUMA nodes, which allows a thread to access a buffer page that is allocated on the local NUMA node rather than from foreign memory.
The buffer manager supports Hot Add Memory, which allows users to add physical memory without restarting the server.
The buffer manager supports large pages on 64-bit platforms. The page size is specific to the version of Windows.
Note
Prior to SQL Server 2012 (11.x), enabling large pages in SQL Server requires trace flag 834.
The buffer manager provides additional diagnostics that are exposed through dynamic management views. You can use these views to monitor a variety of operating system resources that are specific to SQL Server. For example, you can use the sys.dm_os_buffer_descriptors view to monitor the pages in the buffer cache.

Disk I/O

The buffer manager only performs reads and writes to the database. Other file and database operations such as open, close, extend, and shrink are performed by the database manager and file manager components.

Disk I/O operations by the buffer manager have the following characteristics:

All I/Os are performed asynchronously, which allows the calling thread to continue processing while the I/O operation takes place in the background.
All I/Os are issued in the calling threads unless the affinity I/O option is in use. The affinity I/O mask option binds SQL Server disk I/O to a specified subset of CPUs. In high-end SQL Server online transactional processing (OLTP) environments, this extension can enhance the performance of SQL Server threads issuing I/Os.
Multiple page I/Os are accomplished with scatter-gather I/O, which allows data to be transferred into or out of noncontiguous areas of memory. This means that SQL Server can quickly fill or flush the buffer cache while avoiding multiple physical I/O requests.

Long I/O requests

The buffer manager reports on any I/O request that has been outstanding for at least 15 seconds. This helps the system administrator distinguish between SQL Server problems and I/O subsystem problems. Error message 833 is reported and appears in the SQL Server error log as follows:

SQL Server has encountered ## occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [##] in database [##] (#). The OS file handle is 0x00000. The offset of the latest long I/O is: 0x00000.

A long I/O may be either a read or a write; it is not currently indicated in the message. Long-I/O messages are warnings, not errors. They do not indicate problems with SQL Server but with the underlying I/O system. The messages are reported to help the system administrator find the cause of poor SQL Server response times more quickly, and to distinguish problems that are outside the control of SQL Server. As such, they do not require any action, but the system administrator should investigate why the I/O request took so long, and whether the time is justifiable.

Causes of Long-I/O Requests

A long-I/O message may indicate that an I/O is permanently blocked and will never complete (known as lost I/O), or merely that it just has not completed yet. It is not possible to tell from the message which scenario is the case, although a lost I/O will often lead to a latch timeout.

Long I/Os often indicate a SQL Server workload that is too intense for the disk subsystem. An inadequate disk subsystem may be indicated when:

Multiple long I/O messages appear in the error log during a heavy SQL Server workload.
Perfmon counters show long disk latencies, long disk queues, or no disk idle time.

Long I/Os may also be caused by a component in the I/O path (for example, a driver, controller, or firmware) continually postponing servicing an old I/O request in favor of servicing newer requests that are closer to the current position of the disk head. The common technique of processing requests in priority based upon which ones are closest to the current position of the read/write head is known as 'elevator seeking.' This may be difficult to corroborate with the Windows System Monitor (PERFMON.EXE) tool because most I/Os are being serviced promptly. Long I/O requests can be aggravated by workloads that perform large amounts of sequential I/O, such as backup and restore, table scans, sorting, creating indexes, bulk loads, and zeroing out files.

Isolated long I/Os that do not appear related to any of the previous conditions may be caused by a hardware or driver problem. The system event log may contain a related event that helps to diagnose the problem.

Memory pressure detection

Memory pressure is a condition resulting from memory shortage, and can result in:

Extra I/Os (such as very active lazy writer background thread)
Higher recompile ratio
Longer running queries (if memory grant waits exist)
Extra CPU cycles

This situation can be triggered by external or internal causes. External causes include:

Available physical memory (RAM) is low. This causes the system to trim working sets of currently running processes, which may result in overall slowdown. SQL Server may reduce the commit target of the buffer pool and start trimming internal caches more often.
Overall available system memory (which includes the system page file) is low. This may cause the system to fail memory allocations, as it is unable to page out currently allocated memory.Internal causes include:
Responding to the external memory pressure, when the SQL Server Database Engine sets lower memory usage caps.
Memory settings were manually lowered by reducing the max server memory configuration.
Changes in memory distribution of internal components between the several caches.

The SQL Server Database Engine implements a framework dedicated to detecting and handling memory pressure, as part of its dynamic memory management. This framework includes the backgroud task called Resource Monitor. The Resource Monitor task monitors the state of external and internal memory indicators. Once one of these indicators changes status, it calculates the corresponding notification and it broadcasts it. These notifications are internal messages from each of the engine components, and stored in ring buffers.

Two ring buffers hold information relevant to dynamic memory management:

The Resource Monitor ring buffer, which tracks Resource Monitor activity like was memory pressure signaled or not. This ring buffer has status information depending on the current condition of RESOURCE_MEMPHYSICAL_HIGH, RESOURCE_MEMPHYSICAL_LOW, RESOURCE_MEMPHYSICAL_STEADY, or RESOURCE_MEMVIRTUAL_LOW.
The Memory Broker ring buffer, which contains records of memory notifications for each Resource Governor resource pool. As internal memory pressure is detected, low memory notification is turned on for components that allocate memory, to trigger actions meant to balance the memory between caches.

Memory brokers monitor the demand consumption of memory by each component and then based on the information collected, it calculates and optimal value of memory for each of these components. There is a set of brokers for each Resource Governor resource pool. This information is then broadcast to each of the components, which grow or shrink their usage as required.For more information about memory brokers, see sys.dm_os_memory_brokers.

Error Detection

Database pages can use one of two optional mechanisms that help insure the integrity of the page from the time it is written to disk until it is read again: torn page protection and checksum protection. These mechanisms allow an independent method of verifying the correctness of not only the data storage, but hardware components such as controllers, drivers, cables, and even the operating system. The protection is added to the page just before writing it to disk, and verified after it is read from disk.

SQL Server will retry any read that fails with a checksum, torn page, or other I/O error four times. If the read is successful in any one of the retry attempts, a message will be written to the error log and the command that triggered the read will continue. If the retry attempts fail, the command will fail with error message 824.

The kind of page protection used is an attribute of the database containing the page. Checksum protection is the default protection for databases created in SQL Server 2005 (9.x) and later. The page protection mechanism is specified at database creation time, and may be altered by using ALTER DATABASE SET. You can determine the current page protection setting by querying the page_verify_option column in the sys.databases catalog view or the IsTornPageDetectionEnabled property of the DATABASEPROPERTYEX function.

Note

If the page protection setting is changed, the new setting does not immediately affect the entire database. Instead, pages adopt the current protection level of the database whenever they are written next. This means that the database may be composed of pages with different kinds of protection.

Torn Page Protection

Torn page protection, introduced in SQL Server 2000, is primarily a way of detecting page corruptions due to power failures. For example, an unexpected power failure may leave only part of a page written to disk. When torn page protection is used, a specific 2-bit signature pattern for each 512-byte sector in the 8-kilobyte (KB) database page and stored in the database page header when the page is written to disk. When the page is read from disk, the torn bits stored in the page header are compared to the actual page sector information. The signature pattern alternates between binary 01 and 10 with every write, so it is always possible to tell when only a portion of the sectors made it to disk: if a bit is in the wrong state when the page is later read, the page was written incorrectly and a torn page is detected. Torn page detection uses minimal resources; however, it does not detect all errors caused by disk hardware failures. For information on setting torn page detection, see ALTER DATABASE SET Options (Transact-SQL).

Checksum Protection

Checksum protection, introduced in SQL Server 2005 (9.x), provides stronger data integrity checking. A checksum is calculated for the data in each page that is written, and stored in the page header. Whenever a page with a stored checksum is read from disk, the database engine recalculates the checksum for the data in the page and raises error 824 if the new checksum is different from the stored checksum. Checksum protection can catch more errors than torn page protection because it is affected by every byte of the page, however, it is moderately resource intensive. When checksum is enabled, errors caused by power failures and flawed hardware or firmware can be detected any time the buffer manager reads a page from disk. For information on setting checksum, see ALTER DATABASE SET Options (Transact-SQL).

Important

When a user or system database is upgraded to SQL Server 2005 (9.x) or a later version, the PAGE_VERIFY value (NONE or TORN_PAGE_DETECTION) is retained. We recommend that you use CHECKSUM.TORN_PAGE_DETECTION may use fewer resources but provides a minimal subset of the CHECKSUM protection.

Understanding Non-uniform Memory Access

SQL Server is non-uniform memory access (NUMA) aware, and performs well on NUMA hardware without special configuration. As clock speed and the number of processors increase, it becomes increasingly difficult to reduce the memory latency required to use this additional processing power. To circumvent this, hardware vendors provide large L3 caches, but this is only a limited solution. NUMA architecture provides a scalable solution to this problem. SQL Server has been designed to take advantage of NUMA-based computers without requiring any application changes. For more information, see How to: Configure SQL Server to Use Soft-NUMA.

User mode[edit]

User mode is made up of various system-defined processes and DLLs.

The interface between user mode applications and operating system kernel functions is called an 'environment subsystem.' Windows NT can have more than one of these, each implementing a different API set. This mechanism was designed to support applications written for many different types of operating systems. None of the environment subsystems can directly access hardware; access to hardware functions is done by calling into kernel mode routines.^{[citation needed]}

There are four main environment subsystems: the Win32 subsystem, an OS/2 subsystem, the Windows Subsystem for Linux and a POSIX subsystem.^[2]

The Win32 environment subsystem can run 32-bit Windows applications. It contains the console as well as text window support, shutdown and hard-error handling for all other environment subsystems. It also supports Virtual DOS Machines (VDMs), which allow MS-DOS and 16-bit Windows (Win16) applications to run on Windows NT. There is a specific MS-DOS VDM that runs in its own address space and which emulates an Intel 80486 running MS-DOS 5.0. Win16 programs, however, run in a Win16 VDM. Each program, by default, runs in the same process, thus using the same address space, and the Win16 VDM gives each program its own thread on which to run. However, Windows NT does allow users to run a Win16 program in a separate Win16 VDM, which allows the program to be preemptively multitasked, as Windows NT will pre-empt the whole VDM process, which only contains one running application. The Win32 environment subsystem process (csrss.exe) also includes the window management functionality, sometimes called a 'window manager'. It handles input events (such as from the keyboard and mouse), then passes messages to the applications that need to receive this input. Each application is responsible for drawing or refreshing its own windows and menus, in response to these messages.
The OS/2 environment subsystem supports 16-bit character-based OS/2 applications and emulates OS/2 1.x, but not 32-bit or graphical OS/2 applications as used with OS/2 2.x or later, on x86 machines only.^[3] To run graphical OS/2 1.x programs, the Windows NT Add-On Subsystem for Presentation Manager must be installed.^[3] The last version of Windows NT to have an OS/2 subsystem was Windows 2000; it was removed as of Windows XP.^[4]^[5]
The POSIX environment subsystem supports applications that are strictly written to either the POSIX.1 standard or the related ISO/IEC standards. This subsystem has been replaced by Interix, which is a part of Windows Services for UNIX.^[4] This was in turn replaced by the Windows Subsystem for Linux.

The security subsystem deals with security tokens, grants or denies access to user accounts based on resource permissions, handles login requests and initiates login authentication, and determines which system resources need to be audited by Windows NT.^{[citation needed]} It also looks after Active Directory.^{[citation needed]} The workstation service implements the network redirector, which is the client side of Windows file and print sharing; it implements local requests to remote files and printers by 'redirecting' them to the appropriate servers on the network.^[6] Conversely, the server service allows other computers on the network to access file shares and shared printers offered by the local system.^[7]

Kernel mode[edit]

Windows NT kernel mode has full access to the hardware and system resources of the computer and runs code in a protected memory area.^[8] It controls access to scheduling, thread prioritization, memory management and the interaction with hardware. The kernel mode stops user mode services and applications from accessing critical areas of the operating system that they should not have access to; user mode processes must ask the kernel mode to perform such operations on their behalf.

While the x86 architecture supports four different privilege levels (numbered 0 to 3), only the two extreme privilege levels are used. Usermode programs are run with CPL 3, and the kernel runs with CPL 0. These two levels are often referred to as 'ring 3' and 'ring 0', respectively. Such a design decision had been done to achieve code portability to RISC platforms that only support two privilege levels,^[9] though this breaks compatibility with OS/2 applications that contain I/O privilege segments that attempt to directly access hardware.^[10]

Code running in kernel mode includes: the executive, which is itself made up of many modules that do specific tasks; the kernel, which provides low-level services used by the Executive; the Hardware Abstraction Layer (HAL); and kernel drivers.^[8]^[11]

Executive[edit]

The Windows Executive services make up the low-level kernel-mode portion, and are contained in the file NTOSKRNL.EXE.^[8] It deals with I/O, object management, security and process management. These are divided into several subsystems, among which are Cache Manager, Configuration Manager, I/O Manager, Local Procedure Call (LPC), Memory Manager, Object Manager, Process Structure and Security Reference Monitor (SRM). Grouped together, the components can be called Executive services (internal name Ex). System Services (internal name Nt), i.e., system calls, are implemented at this level, too, except very few that call directly into the kernel layer for better performance.^{[citation needed]}

The term 'service' in this context generally refers to a callable routine, or set of callable routines. This is distinct from the concept of a 'service process', which is a user mode component somewhat analogous to a daemon in Unix-like operating systems.

Each object in Windows NT exists in a global namespace. This is a screenshot from SysinternalsWinObj.

Object Manager: The Object Manager (internal name Ob) is an executive subsystem that all other executive subsystems, especially system calls, must pass through to gain access to Windows NT resources—essentially making it a resource management infrastructure service.^[12] The object manager is used to reduce the duplication of object resource management functionality in other executive subsystems, which could potentially lead to bugs and make development of Windows NT harder.^[13] To the object manager, each resource is an object, whether that resource is a physical resource (such as a file system or peripheral) or a logical resource (such as a file). Each object has a structure or object type that the object manager must know about.

Object creation is a process in two phases, creation and insertion. Creation causes the allocation of an empty object and the reservation of any resources required by the object manager, such as an (optional) name in the namespace. If creation was successful, the subsystem responsible for the creation fills in the empty object.^[14] Finally, if the subsystem deems the initialization successful, it instructs the object manager to insert the object, which makes it accessible through its (optional) name or a cookie called a handle.^[15] From then on, the lifetime of the object is handled by the object manager, and it's up to the subsystem to keep the object in a working condition until being signaled by the object manager to dispose of it.^[16]

Handles are identifiers that represent a reference to a kernel resource through an opaque value.^[17] Similarly, opening an object through its name is subject to security checks, but acting through an existing, open handle is only limited to the level of access requested when the object was opened or created.^{[citation needed]}

Object types define the object procedures and any data specific to the object. In this way, the object manager allows Windows NT to be an object-oriented operating system, as object types can be thought of as polymorphic classes that define objects. Most subsystems, though, with a notable exception in the I/O Manager, rely on the default implementation for all object type procedures.^{[citation needed]}

Each instance of an object that is created stores its name, parameters that are passed to the object creation function, security attributes and a pointer to its object type. The object also contains an object close procedure and a reference count to tell the object manager how many other objects in the system reference that object and thereby determines whether the object can be destroyed when a close request is sent to it.^[18] Every named object exists in a hierarchical object namespace.

Cache Controller: Closely coordinates with the Memory Manager, I/O Manager and I/O drivers to provide a common cache for regular file I/O. The Windows Cache Manager operates on file blocks (rather than device blocks), for consistent operation between local and remote files, and ensures a certain degree of coherency with memory-mapped views of files, since cache blocks are a special case of memory-mapped views and cache misses a special case of page faults.

Configuration Manager: Implements the Windows Registry.

I/O Manager: Allows devices to communicate with user-mode subsystems. It translates user-mode read and write commands into read or write IRPs which it passes to device drivers. It accepts file system I/O requests and translates them into device specific calls, and can incorporate low-level device drivers that directly manipulate hardware to either read input or write output. It also includes a cache manager to improve disk performance by caching read requests and write to the disk in the background.

Local Procedure Call (LPC): Provides inter-process communication ports with connection semantics. LPC ports are used by user-mode subsystems to communicate with their clients, by Executive subsystems to communicate with user-mode subsystems, and as the basis for the local transport for Microsoft RPC.

Memory Manager: Manages virtual memory, controlling memory protection and the paging of memory in and out of physical memory to secondary storage, and implements a general-purpose allocator of physical memory. It also implements a parser of PE executables that lets an executable be mapped or unmapped in a single, atomic step.

Starting from Windows NT Server 4.0, Terminal Server Edition, the memory manager implements a so-called session space, a range of kernel-mode memory that is subject to context switching just like user-mode memory. This lets multiple instances of the kernel-mode Win32 subsystem and GDI drivers run side-by-side, despite shortcomings in their initial design. Each session space is shared by several processes, collectively referred to as a 'session'.

To ensure a degree of isolation between sessions without introducing a new object type, the association between processes and sessions is handled by the Security Reference Monitor, as an attribute of a security subject (token), and it can only be changed while holding special privileges.

The relatively unsophisticated and ad-hoc nature of sessions is due to the fact they weren't part of the initial design, and had to be developed, with minimal disruption to the main line, by a third party (Citrix Systems) as a prerequisite for their terminal server product for Windows NT, called WinFrame. Starting with Windows Vista, though, sessions finally became a proper aspect of the Windows architecture. No longer a memory manager construct that creeps into user mode indirectly through Win32, they were expanded into a pervasive abstraction affecting most Executive subsystems. As a matter of fact, regular use of Windows Vista always results in a multi-session environment.^[19]

Process Structure: Handles process and thread creation and termination, and it implements the concept of Job, a group of processes that can be terminated as a whole, or be placed under shared restrictions (such a total maximum of allocated memory, or CPU time). Job objects were introduced in Windows 2000.

PnP Manager: Handles plug and play and supports device detection and installation at boot time. It also has the responsibility to stop and start devices on demand—this can happen when a bus (such as USB or IEEE 1394 FireWire) gains a new device and needs to have a device driver loaded to support it. Its bulk is actually implemented in user mode, in the Plug and Play Service, which handles the often complex tasks of installing the appropriate drivers, notifying services and applications of the arrival of new devices, and displaying GUI to the user.

Power Manager: Deals with power events (power-off, stand-by, hibernate, etc.) and notifies affected drivers with special IRPs (Power IRPs).

Security Reference Monitor (SRM): The primary authority for enforcing the security rules of the security integral subsystem.^[20] It determines whether an object or resource can be accessed, via the use of access control lists (ACLs), which are themselves made up of access control entries (ACEs). ACEs contain a Security Identifier (SID) and a list of operations that the ACE gives a select group of trustees—a user account, group account, or login session^[21]—permission (allow, deny, or audit) to that resource.^[22]^[23]

GDI: The Graphics Device Interface is responsible for tasks such as drawing lines and curves, rendering fonts and handling palettes. The Windows NT 3.x series of releases had placed the GDI component in the user-mode Client/Server Runtime Subsystem, but this was moved into kernel mode with Windows NT 4.0 to improve graphics performance.^[24]

Kernel[edit]

The kernel sits between the HAL and the Executive and provides multiprocessor synchronization, thread and interrupt scheduling and dispatching, and trap handling and exception dispatching; it is also responsible for initializing device drivers at bootup that are necessary to get the operating system up and running. That is, the kernel performs almost all the tasks of a traditional microkernel; the strict distinction between Executive and Kernel is the most prominent remnant of the original microkernel design, and historical design documentation consistently refers to the kernel component as 'the microkernel'.

The kernel often interfaces with the process manager.^[25] The level of abstraction is such that the kernel never calls into the process manager, only the other way around (save for a handful of corner cases, still never to the point of a functional dependence).

Kernel-mode drivers[edit]

Windows NT uses kernel-mode device drivers to enable it to interact with hardware devices. Each of the drivers has well defined system routines and internal routines that it exports to the rest of the operating system. All devices are seen by user mode code as a file object in the I/O manager, though to the I/O manager itself the devices are seen as device objects, which it defines as either file, device or driver objects. Kernel mode drivers exist in three levels: highest level drivers, intermediate drivers and low level drivers. The highest level drivers, such as file system drivers for FAT and NTFS, rely on intermediate drivers. Intermediate drivers consist of function drivers—or main driver for a device—that are optionally sandwiched between lower and higher level filter drivers. The function driver then relies on a bus driver—or a driver that services a bus controller, adapter, or bridge—which can have an optional bus filter driver that sits between itself and the function driver. Intermediate drivers rely on the lowest level drivers to function. The Windows Driver Model (WDM) exists in the intermediate layer. The lowest level drivers are either legacy Windows NT device drivers that control a device directly or can be a PnP hardware bus. These lower level drivers directly control hardware and do not rely on any other drivers.

Hardware abstraction layer[edit]

The Windows NT hardware abstraction layer, or HAL, is a layer between the physical hardware of the computer and the rest of the operating system. It was designed to hide differences in hardware and provide a consistent platform on which the kernel is run. The HAL includes hardware-specific code that controls I/O interfaces, interrupt controllers and multiple processors.

However, despite its purpose and designated place within the architecture, the HAL isn't a layer that sits entirely below the kernel, the way the kernel sits below the Executive: All known HAL implementations depend in some measure on the kernel, or even the Executive. In practice, this means that kernel and HAL variants come in matching sets that are specifically constructed to work together.