Update Statistics for a partition

June 22, 2009

Last week, I was working with a client and we wanted to update statistics on a partition. After trying it out, then consulting with BOL and finally twitter, I determined that it could not be done. Thanks to Lara Rubbelke for her help and pointing out the second issue that is discussed below.

In addition to that problem, when a partition’s index is defragmented, statistics are not updated. I suppose an easy workaround is to update statistics, but the problem is that it updates statistics for all of the partitions and/or indexes of that object, not the specific one that was just defragmented.

So, I created a suggestion @connect.microsoft.com. I’ll post the entire subject / issue below, but please visit the site, vote and validate this request.

This is an https:// link, so if you aren’t logged into connect it will not work. That said, if you don’t have a login for this site, I’d recommend that you get one. Tons of great information can be found there. If you wish to search for it, go to connect.microsoft.com and type in “Update Statistics at the partition level” OR search for feedbackid 468517. https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=468517

Full description and proposed solution below:

Description:
I’d like to see Update Statistics modified to include partitionID in combination with the existing table or indexed view name.

With the improvements of 2k8 and partitioning, very large result sets are being seen at one of my clients that utilizes partitioning. Having to update statistics against partitions that haven’t changed in order to achieve the change on the partition that actually needs it, creates additional overhead that is not required. It also forces a change to plans that wouldn’t otherwise be altered which causes for memory churn in the proc cache.

One other related note is that you can defrag an individual partition index, but it will not update the statistics.

Proposed Solution
Two parts:
1.)
Modify Update Statistics to include an option for declaring either a partition name or partitionID for either a table or an indexed view.
i.e…
UPDATE STATISTICS table_or_indexed_view_name
[
{
{ index_or_statistics__name }
{ index_or_statistics_name }
| ( { Partition_index_or_partition_statistics_name } [ ,...n ] )
}
]
2.) Upon completion of the defrag of an individual partition index have it update the statistics for that partition.

Benefits
Improved Reliability
Improved Administration
Improved Performance


Meta Data – Cardinality Explored

June 18, 2009

Last night I had the privilege of speaking at the Colorado Springs SQL Server Users Group. The subject was Meta Data – Cardinality Explored.

This topic is a very enjoyable one to me. Along with the DMV’s that SQL Server now provides, the meta-data about your database implementations is critical to their scalability, stability, accuracy and performance.

Over the years, I have found that cardinality is not a topic that is often understood. I also believe that it’s not discussed all that often. It is an absolutely critical data-set for the developers, DBA’s, testers and performance testers out there.

Attached to this post you will find my sql script and PowerPoint slide deck. I’d recommend starting with the slide deck… walk through it and when you see “demo”, open up the sql file and learn how this result set can make an impact with the databases that you are responsible for or those that you are developing against.

Have fun and if you have questions about the presentation or the sql code I put together, feel free to comment!

Thanks again to the terrific folks down in Colorado Springs for the warm welcome and not falling asleep as I droned on and on. =)

CSSQLUG_June09
MetaData – Cardinality Explored – SQL


Schema Management (DDL Triggers)

April 29, 2009

Over the years I have used a variety of tools for schema management.  The best one that I really relied on for quite a while was from RedGate.  They offer schema compare and data compare tools that really make quick work of determing the delta against two environments.

Recently I’ve been working with DB Pro to manage this and thus far I’ve been very, very impressed.  However, it has some downside; primarly the cost.  Some of my clients just can’t justify / stomach the dollars involved.  So, in this gap I have been using DDL Triggers.  It’s an automated log, stored in a database (table) that captures ddl changes (Create, Drop, Alter).   This concept is different than a traditional trigger in that it only fires for database events, not data events.  Meaning that the scope of this trigger is at the schema or the structure level, not the data or the dml level.

The value that it has provided some of my clients is simply the certainty that they know what happened in a specific environment and it saves a great deal of time when it comes to debugging, troubleshooting or otherwise trying to figure out why something that worked perfectly yesterday in the database does not work correctly today.

There are a few steps that need to be taken to make this work.  The first is to determine if you want to store this data in it’s own database; or store it in a table in an existing database.  It can be done either way.  In the code below you will see that I’ve created a stand alone db with one table in it.

CREATE DATABASE [DDL_ChangeLog]

GO

use ddl_changelog;

CREATE TABLE DDLChangeLog (

DDLChangeLogID [int] IDENTITY(1,1) NOT NULL,

eventtype nvarchar(100),

objectname nvarchar(100),

objecttype nvarchar(100),

databasename nvarchar(100),

schemaname nvarchar(100),

SystemUser nvarchar(255),

AppName nvarchar(255),

HostName nvarchar(255),

tsql nvarchar(MAX),

createddate datetime,

CONSTRAINT [PK_DDL_Changelog] PRIMARY KEY CLUSTERED

(DDLChangeLogID ASC))
The code above will simply create a database and a table.

Once you have that in place, then choose the database that you would like to track ddl changes on and create this database trigger.

 

CREATE TRIGGER [trgLogDDLEvent] ON DATABASE

FOR DDL_DATABASE_LEVEL_EVENTS

AS

DECLARE @data XML

SET @data = EVENTDATA()

IF @data.value(‘(/EVENT_INSTANCE/EventType)[1]‘, ‘nvarchar(100)’) <> ‘CREATE_STATISTICS’

INSERT INTO ddl_changelog..DDLChangeLog

(EventType, ObjectName, ObjectType, DatabaseName, SchemaName, SystemUser, AppName, HostName, tsql , createddate)

VALUES (

@data.value(‘(/EVENT_INSTANCE/EventType)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/ObjectName)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/ObjectType)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/DatabaseName)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/SchemaName)[1]‘,

                           ‘nvarchar(100)’),

system_user , app_name (),host_name(),

@data.value(‘(/EVENT_INSTANCE/TSQLCommand)[1]‘,

                           ‘nvarchar(max)’) ,

getdate()) ;

GO

ENABLE TRIGGER [trgLogDDLEvent] ON DATABASE

GO
So, let’s test this out.

With the database that you just created the trigger on, let’s do something simple like this:

            SELECT 1 as test into TestTable;

           SELECT * from TestTable;

           DROP TABLE TestTable;

SELECT * FROM DDL_ChangeLog..DDLChangeLog

wow!  there is data there, not just any data, but it should have returned two rows; even though there are three statements above in our test.  This is because that trigger only fired for the Create table (select into) and the drop table statements.  The trigger will not fire for the select statement.

I initially started using these for my own reasons / purposes; however as time has gone on, I’ve found that there are other audiences that LOVE knowing this can be tracked… Auditors love this functionality, QA loves to be able to verify that the blackbox they are testing has some controls / audit processes around it.  Management loves it because they can have hard data to go back to the business on in the event of an outage, mistake or whatever.  I’ve found that it’s typically best to be as transparent as possible with changes / modifications and the state of the databases that I’m responsible for.  In this manner, I’m able to better serve my clients and in turn that are more likely to believe / trust my opinion when it counts.


Policy Management

April 27, 2009

In Novemeber, I had the opportunity to present at DevConnections.  I spoke on two topics, Policy Management and Plan Guides.  Both of these are new / unique to SQL Server 2008 and I really love both of the additions to the SQL Server platform.

In this blog post I have attached two files; one of them is my powerpoint slide deck and the other is the script I used for the demo’s.

This was a very difficult talk to give as the concept is/was relatively new and the amount of information available was scarce when I was developing the content. 

So, what is Policy Management?  Microsoft’s definition is as follows :  “Policy-Based Management is a policy-based system for managing one or more instances of SQL Server 2008.  Use this with SQL Server Management Studio to create policies that manage entities on the server, such as the instance of SQL Server, databases, and other SQL Server objects.”  What this definition leaves out is something of great importance.  This new feature CAN be used with down versions, providing that the facet you want to throw a policy around is availalable in that version.  Meaning, if you want to monitor datafiles or schema changes in a SQL Server 2005 instance, you can!

I found there to be three components to Policy Management.

  1. Policy Management  — Creation of conditions and policies
  2. Explicit Administration — One to many managed targets
  3. Automated Administration — Automate enforcement of policies

For each of these there are components – Refer to the slide deck and Microsoft’s site for more information.

One of the biggest helps that I had when leaning Policy Management was the following CTE: 

The results of this query provide all of the available facet’s and which mode then can be run for.  This is a huge help as going through the gui to see if something is availalble for a certain type of mode can take a while. 

USE MSDB;
with automatedpolicyexecutionmode (modeid, modename)
as
          (select * from (values (0, ‘On Demand’), 
                              (1,’Enforce Compliance’), 
                              (2, ‘Check on Change and Log’),
                              (4, ‘Check on Schedule and Log’))
as em(modeid, modename))
 

 

So far I have successfully implemented policies against a host of the facets offered.  Microsoft is going to continue to build on this inital release as time goes on; so look to new additions of facets as time goes on.

Enjoy Policy Based management and may it save you a great deal of time, decrease your daily / weekly checklist of items and enable you to provide further value to your organization.

 

SELECT dmf.management_facet_id as facetid, dmf.name as facetname, apemode.modename
FROM syspolicy_management_facets as dmf
       INNER JOIN automatedpolicyexecutionmode as apemode on dmf.execution_mode & apemode.modeid=apemode.modeid
ORDER BY dmf.name, apemode.modename

If you walk through the slide deck and the demo script; you can quickly see how easy it is to create a policy.  Once you have one created (IN DEVELOPMENT),  =), then evaluate the policy in two ways.  The first way you will want to ensure that the evaluation is successful.  The second time, break it, so that the evaluation of the policy is false.  Once you have done this, run the query below.

 

USE MSDB;
GO
SELECT a.execution_date AS ‘Date Run’ , 
                  c.name AS ‘Policy’ , 
                  a.target_query_expression AS ‘Policy Failure Targets’ , 
                  d.name as ‘Condition’,
                  d.description as ‘Condition Description’, 
                  d.facet as ‘Condition Facet’
FROM syspolicy_policy_execution_history_details_internal a
              INNER JOIN syspolicy_policy_execution_history_internal b ON a.history_id = b.history_id
              INNER JOIN syspolicy_policies_internal c ON b.policy_id = c.policy_id
                           INNER JOIN syspolicy_conditions d ON c.condition_id = d.condition_id
WHERE a.result = 0
ORDER BY a.execution_date DESC, c.name ASC, d.name ASC

Attachements for this post ::

Script: policymanagementscript2

Slide Deck: sql-connections-conference-slides-policy-management2


Partitioning

April 16, 2009

Over the past few years I have implemented partitioning a number of ways.  In the world of SQL Server 2000, it was “home-made”.  Meaning that I would create a number of tables whose schema was identical, create check constraints and then link all of the data together with views.  This method had tremendous upside, but as you can imagine, the overhead ov moving data from one “partition” to another was very time intensive and expensive from an I/O perspective.

With SQL Server 2005, Microsoft introduced partitioning as part of the relation engine… This was huge news for me specifically.  I could actually do this “Right” for my clients.  As many of you know, paritioning has some limitations in SQL Server 2005 that made a great option, but the limitation of parallelism that accompanied the feature left for a bad taste in the mouths of many.

This brings us to SQL Server 2008 where, based on my testing, the parrallism issue is truly resolved and the optimizer is actually recoqnizing the partitions as it should. 

So, what have I done?  I’ve implemented it all over the place.  That is the point of this post.  At one client, it’s worked just as I imagined that it would… at another, I’ve had a bit of stumbling block.  I did not realize the impact that non-clustered indexes would have if they weren’t created with the partition.  So what I ended up with, over time were very, very, very heavily fragmented indexes.  In fact, some of them were so bad that the optimizer was choosing different paths to the data.  (That’s B.A.D.).

I thought my fix would be a tough one.  But it turned out to be uber simple.  I dropped the indexes that existed on the primary filegroup and re-created them on the partition.  Viola, piece of cake.  Everything was awesome, until about 2 weeks later when the fragmentation of those indexes caught up with me.  This left me cursing the amount of time it was going to take to rebuild these huge indexes and wondering how often I’d have to do it…

So, I stopped trying to do it my way and went to my favorite resource.  BOL.  Of course, there is a way to do this and it’s not painful.

It’s the function $PARTITION.  If you are using partitioning in your environment and you are unaware of this function, become aware. 

What this function enabled, in a very easy and straight foward manner was for me to quickly and dynamically identify which partition was the active partition.  Once I had that information, I simply wrote an alter index statement with rebuild where the partition equaled my variable value.

Code snippet below.

Declare @PartitionID INT

Select @PartitionID  =  (SELECT $PARTITION.PartitionName (getdate()));

ALTER INDEX IndexName ON owner.TableName

REBUILD Partition = @PartitionID;

    –  Note that I can utilize the getdate function with the $Partition function.  Since my partition function (key) is a datetime value.  Simple logic could be used if it were an int.

The obvious advantage to this is that when I run the rebuild index script, it is only running for the active partition.  This brings down the amount of time required for this action and enables me to run it nightly as the data becomes very fragmented from day to day in the current partition.

So, my lesson for this day?

Easy, think ahead about what will happen down the road.  Always check BOL and test, test, test. 

Happy Partitioning!