Performance Tuning Best Practices for SQL on Azure VM

In this post, I am going to talk about Performance Tuning Best Practices for SQL running on an Azure VM with key considerations. Understand the characteristics of running SQL Server on a Windows Azure VM (IaaS). and what impacts performance. Additionally, learn how to monitor and troubleshoot performance of SQL Server running on Windows Azure VM. Let’s start with the best Practices for SQL on Azure VM.

Note: If you buy something from our links, we might earn a commission. See our disclosure statement.

Best Practices for SQL on Azure VM

OLTP systems with transactional workloads

Large number of small transactions with both Reads as well as Inserts & Updates
Transactions are mostly similar in pattern
Significant amount of reads (Varies between 60% to 80%)
Random Read/Writes between 8k-64k chunks.
User concurrency is High.

Read my earlier post on SQL Server Performance Tuning for All Editions.

Windows Azure VM Characteristics

Architecture
Latency
Variability
Configuration options

VM Configuration Options

Virtual Machine size
Network bandwidth
Disk types and configurations
Disk cache settings

VM Disk Options

Default OS disk with persistent data and caching.

This is a 127GB disk optimized for OS access patterns and Windows (or Linux) boot up time. This is not meant for either transactional or Data Warehouse (Analytical) workloads.

Attach additional Persistent Data Disks

You can add up to 16 disks with 1 TB in size and 500 IOPS each.

Temporary local disk drives which are a non-persistent disk

It is used for temporary data storage & OS page files. This is not meant for data storage or SQL Server files.
Hosted in attached disks on every physical host. This means that it is not a part of the Azure Blob Storage and does not give you the regular SLA provided by Azure.
It is cleaned up in the event of a VM Failover from one host system to another or recycling.
Physical disks are shared across other VMs on the same physical machine. This means that the performance is not guaranteed and can have a large variance.
*Not recommended for user or system database files

If you agree to all the conditions mentioned above, you can still go ahead and put the TEMP DB system files on the non-persistent data drive for performance reasons.

Windows Azure IO subsystem

The storage disks are implemented as a shared multi-tenant service with built-in triple redundancy, and with optional geo-redundancy across regions. Given below are the main characteristics of Azure Disk subsystem:

Performance is less predictable than on-premises due to multi-tenancy and other reasons.
It is a shared infrastructure, where host machines, storage services, network bandwidth are shared between subscribers.
Performance can depend on which region the VM is provisioned. If your VM is at a far away distance, then it will have a direct impact on network latency.
Subject to downtime and maintenance operations. Availability still stands at 99.99%.
Choose between granular control & configuration for on-premise deployment vs. cost, simplicity, out of box redundancy for Azure deployments.

How does VM Disk Caching work?

The VM disk is cached in both inside physical host machine and also to an external Azure Page Blobs. This greatly reduces disk I/O latency by reducing the number of transactions hitting against the Azure Storage subsystem.

The VM Cache is a 2-tier cache which stores recently accessed data in host RAM cache – space shared by all VMs on the machine in the form of Page Blobs. The second Tier stores less recently accessed data stored on local hard disks of the physical host machine. There is a reserved cache space for VM “OS Disk” and “Data Disks” based on the VM size.

VM cache settings

Read Only: All requests cached for future reads. All writes persisted directly to Windows Azure Storage Blobs.

Read Write: This is a kind of a “Read Ahead” algorithm. The reads and writes are cached for future reads. Non-write-through writes persisted to local cache first. For SQL Server, writes are persisted to WA storage because it uses Write-through It provides the lowest disk latency for light workloads

None (disabled): This allows you to bypasses the cache. All the data are transferred to disk and persisted to the Windows Azure Storage. This method gives you the highest I/O rate for I/O intensive workloads. You also need to take “Transaction Cost” into consideration.

Caching by default is disabled on the Data Disks. You can enable caching of upto 4 Read only and 4 Read write disks for data drives.

Disk Caching Best Practices for SQL Server

For the OS drive, the default caching is “Read Write” which reduces latency as mentioned before.

For Data Disks, the default is “None (Disabled)” which gives you the highest throughput. For Read intensive workload, Enable “Read Only” caching for better performance.

VM network performance best practices

Network latency is higher compared to an on-premise deployment. Reducing network round trips can greatly improve performance.
Consolidate applications which have high transactions or “Chatty” Apps on the same virtual machine to reduce round-trips.
Host your VM’s on the same V-NET & Cloud Service.
VM’s with Always On availability groups needs to be in the same Cloud Service. Routing of Read-Only transactions to secondaries is recommended using a listener on the Public IP interface.

Azure Disk Configuration Best Practices

For Databases of <1TB storage, a Single Data Disk can be recommended with acceptable performance.
For >1TB DB files and Higher IOPS requirement, use multiple data disks with the following configuration:

[visualizer id=”910″]

Just like SQL Server On-Premises, use Filegroups with multiple database files(MDF* & Ldf*), and place each database file on a separate data disks.
For more performance, you can create a striped Disk using multiple data disks using the Windows Server storage space.
Storage spaces are recommended over OS striped volumes.
Spreading the I/O workload across a number of data disks benefits smaller random access patterns for OLTP operations where IOPS and bandwidth scale in a nearly linear fashion.
As the I/O block size increases, for reading operations, IOPS or bandwidth doesn’t linearly scale by adding more data disks.
If your workload is read intensive with analytical queries, adding more disks will not necessarily help.
For write intensive workload, adding more data disks can increase performance in a nearly linear fashion. This means that you can benefit from placing each transaction log for multiple databases on a separate data disk.
For large sequential, I/O block sizes write generally scale better than reads.

Use Data Compression to reduce IO

As you can see, there is a significant reduction of logical and physical reads with page compression enabled, and the elapsed time. Unsurprisingly, CPU time of the query does go up with page compression as expected, because SQL Server needs to decompress the data while returning results to the query. Please note that your results will vary, depending on your workload and scenario.

Use these data compression techniques for I/O intensive workloads and then decide which tables and indexes to compress. If your CPU is already highly utilized, do not implement this or upgrade the VM to a higher service tier.This is not enabled by default in Azure VM images. Reduces IO for Creating a DB Restoring, DB Adding files to a DB, Extending file size, Autogrow, etc. Add SQL service account to Perform Volume Maintenance Tasks security policy.

This step will not help, if you have performance issues at the Transaction Log throughput level.

Recommendations for Instant File Initialization

Ensure that you Pre-size all database files appropriately
Configure Instant File Initialization before creating database
Configure Instant File Initialization first and then extend files like TEMPDB.
Configure Instant File Initialization first before restoring a database on a new server or on a newly create VM.
Restart SQL Server Service for the configuration manager after configuring Instant File Initialization.

[visualizer id=”918″]

Other Recommendations for Data Disk Performance

Spreading Data files of a single Database into multiple storage accounts is not recommended. This is because Blobs are not written at the same time and could be out of sync. Instead, spread the data files of different databases across multiple disks to achieve higher IOPS / bandwidth performance.

A storage account has a hard limit of 20000 tps (Transactions per second)

Performance Counter to Capture and Analyse Database performance on an Azure VM

See my post on Free Tools for Monitoring SQL Server Performance

LogicalDisk(*)Avg. Disk Bytes/Read

LogicalDisk(*)Avg. Disk Bytes/Write

LogicalDisk(*)Avg. Disk Queue Length

LogicalDisk(*)Avg. Disk sec/Read

LogicalDisk(*)Avg. Disk sec/Write

LogicalDisk(*)Current Disk Queue Length

LogicalDisk(*)Disk Read Bytes/sec

LogicalDisk(*)Disk Reads/sec

LogicalDisk(*)Disk Write Bytes/sec

LogicalDisk(*)Disk Writes/sec

MemoryAvailable MBytes

MemoryFree System Page Table Entries

MemoryPages/sec

Network Interface(*)*

Process(*)*

Processor(*)% Privileged Time

Processor(*)% Processor Time

SQLServer:Availability Replica(*)*

SQLServer:Access Methods*

SQLServer:Buffer Manager*

SQLServer:Buffer Node*

SQLServer:Databases(*)*

SQLServer:Database Replica(*)*

SQLServer:General Statistics*

SQLServer:Latches*

SQLServer:Locks(_Total)Average Wait Time (ms)

SQLServer:Locks(_Total)Lock Requests/sec

SQLServer:Memory Manager*

SQLServer:Plan Cache*

SQLServer:Wait Statistics*

SQLServer:SQL Statistics*

SystemContext Switches/sec

SystemProcessor Queue Length

Use tools like PAL from Codeplex to analyze the results.

I hope this was helpful. Please comment below to give feedback.

References

I hope that you like this post on Best Practices for SQL on Azure VM. Any comments or suggestions are welcome in the comment section below. Cheers!

Affiliate Disclosure: Faceofit.com is a participant in the Amazon Services LLC Associates Program. As an Amazon Associate we earn from qualifying purchases.