Archive for the ‘ Partitioning ’ Category

Statistics for partitions

As data volumes continue to expand and partitioning is further adopted in SQL Server 2008 the lack of control over statistics at the partition level will become a larger issue for more companies.

I placed this as a connect item on Microsoft’s site in June of 2009.  Per the final comment from Microsoft, they recommend that the more votes it has, the higher priority it will become.

If you are reading this post, please vote, here:  https://connect.microsoft.com/SQLServer/feedback/details/468517/update-statistics-at-the-partition-level

I’m posting the connect details below. 

I’d like to see Update Statistics modified to include partitionID in combination with the existing table or indexed view name.

With the improvements of 2k8 and partitioning, very large result sets are being seen at one of my clients that utilizes partitioning. Having to update statistics against partitions that haven’t changed in order to achieve the change on the partition that actually needs it, creates additional overhead that is not required. It also forces a change to plans that wouldn’t otherwise be altered which causes for memory churn in the proc cache.

One other related note, is that you can defrag an individual partition index, but it will not update the statistics.

Proposed Solution
Two parts:

1.)
Modify Update Statistics to include an option for declaring either a partition name or partitionID for either a table or an indexed view.
i.e…
UPDATE STATISTICS table_or_indexed_view_name
[
{
{ index_or_statistics__name }
{ index_or_statistics_name }
| ( { Partition_index_or_partition_statistics_name } [ ,…n ] )
}
]

2.) Upon completion of the defrag of an individual partition index have it update the statistics for that partition.

 
Benefits
Improved Reliability
Improved Administration
Improved Performance
Other (please provides details below)
Advertisements

Update Statistics for a partition

Last week, I was working with a client and we wanted to update statistics on a partition. After trying it out, then consulting with BOL and finally twitter, I determined that it could not be done. Thanks to Lara Rubbelke for her help and pointing out the second issue that is discussed below.

In addition to that problem, when a partition’s index is defragmented, statistics are not updated. I suppose an easy workaround is to update statistics, but the problem is that it updates statistics for all of the partitions and/or indexes of that object, not the specific one that was just defragmented.

So, I created a suggestion @connect.microsoft.com. I’ll post the entire subject / issue below, but please visit the site, vote and validate this request.

This is an https:// link, so if you aren’t logged into connect it will not work. That said, if you don’t have a login for this site, I’d recommend that you get one. Tons of great information can be found there. If you wish to search for it, go to connect.microsoft.com and type in “Update Statistics at the partition level” OR search for feedbackid 468517.  https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=468517

Full description and proposed solution below:

Description:
I’d like to see Update Statistics modified to include partitionID in combination with the existing table or indexed view name.

With the improvements of 2k8 and partitioning, very large result sets are being seen at one of my clients that utilizes partitioning. Having to update statistics against partitions that haven’t changed in order to achieve the change on the partition that actually needs it, creates additional overhead that is not required. It also forces a change to plans that wouldn’t otherwise be altered which causes for memory churn in the proc cache.

One other related note is that you can defrag an individual partition index, but it will not update the statistics.

Proposed Solution
Two parts:
1.)
Modify Update Statistics to include an option for declaring either a partition name or partitionID for either a table or an indexed view.
i.e…
UPDATE STATISTICS table_or_indexed_view_name
[
{
{ index_or_statistics__name }
{ index_or_statistics_name }
| ( { Partition_index_or_partition_statistics_name } [ ,…n ] )
}
]
2.) Upon completion of the defrag of an individual partition index have it update the statistics for that partition.

Benefits
Improved Reliability
Improved Administration
Improved Performance

Partitioning

Over the past few years I have implemented partitioning a number of ways.  In the world of SQL Server 2000, it was “home-made”.  Meaning that I would create a number of tables whose schema was identical, create check constraints and then link all of the data together with views.  This method had tremendous upside, but as you can imagine, the overhead ov moving data from one “partition” to another was very time intensive and expensive from an I/O perspective.

With SQL Server 2005, Microsoft introduced partitioning as part of the relation engine… This was huge news for me specifically.  I could actually do this “Right” for my clients.  As many of you know, paritioning has some limitations in SQL Server 2005 that made a great option, but the limitation of parallelism that accompanied the feature left for a bad taste in the mouths of many.

This brings us to SQL Server 2008 where, based on my testing, the parrallism issue is truly resolved and the optimizer is actually recoqnizing the partitions as it should. 

So, what have I done?  I’ve implemented it all over the place.  That is the point of this post.  At one client, it’s worked just as I imagined that it would… at another, I’ve had a bit of stumbling block.  I did not realize the impact that non-clustered indexes would have if they weren’t created with the partition.  So what I ended up with, over time were very, very, very heavily fragmented indexes.  In fact, some of them were so bad that the optimizer was choosing different paths to the data.  (That’s B.A.D.).

I thought my fix would be a tough one.  But it turned out to be uber simple.  I dropped the indexes that existed on the primary filegroup and re-created them on the partition.  Viola, piece of cake.  Everything was awesome, until about 2 weeks later when the fragmentation of those indexes caught up with me.  This left me cursing the amount of time it was going to take to rebuild these huge indexes and wondering how often I’d have to do it…

So, I stopped trying to do it my way and went to my favorite resource.  BOL.  Of course, there is a way to do this and it’s not painful.

It’s the function $PARTITION.  If you are using partitioning in your environment and you are unaware of this function, become aware. 

What this function enabled, in a very easy and straight foward manner was for me to quickly and dynamically identify which partition was the active partition.  Once I had that information, I simply wrote an alter index statement with rebuild where the partition equaled my variable value.

Code snippet below.

Declare @PartitionID INT

Select @PartitionID  =  (SELECT $PARTITION.PartitionName (getdate()));

ALTER INDEX IndexName ON owner.TableName

REBUILD Partition = @PartitionID;

    —  Note that I can utilize the getdate function with the $Partition function.  Since my partition function (key) is a datetime value.  Simple logic could be used if it were an int.

The obvious advantage to this is that when I run the rebuild index script, it is only running for the active partition.  This brings down the amount of time required for this action and enables me to run it nightly as the data becomes very fragmented from day to day in the current partition.

So, my lesson for this day?

Easy, think ahead about what will happen down the road.  Always check BOL and test, test, test. 

Happy Partitioning!

%d bloggers like this: