Meta Data – Cardinality Explored

June 18, 2009

Last night I had the privilege of speaking at the Colorado Springs SQL Server Users Group. The subject was Meta Data – Cardinality Explored.

This topic is a very enjoyable one to me. Along with the DMV’s that SQL Server now provides, the meta-data about your database implementations is critical to their scalability, stability, accuracy and performance.

Over the years, I have found that cardinality is not a topic that is often understood. I also believe that it’s not discussed all that often. It is an absolutely critical data-set for the developers, DBA’s, testers and performance testers out there.

Attached to this post you will find my sql script and PowerPoint slide deck. I’d recommend starting with the slide deck… walk through it and when you see “demo”, open up the sql file and learn how this result set can make an impact with the databases that you are responsible for or those that you are developing against.

Have fun and if you have questions about the presentation or the sql code I put together, feel free to comment!

Thanks again to the terrific folks down in Colorado Springs for the warm welcome and not falling asleep as I droned on and on. =)

CSSQLUG_June09
MetaData – Cardinality Explored – SQL


Schema Management (DDL Triggers)

April 29, 2009

Over the years I have used a variety of tools for schema management.  The best one that I really relied on for quite a while was from RedGate.  They offer schema compare and data compare tools that really make quick work of determing the delta against two environments.

Recently I’ve been working with DB Pro to manage this and thus far I’ve been very, very impressed.  However, it has some downside; primarly the cost.  Some of my clients just can’t justify / stomach the dollars involved.  So, in this gap I have been using DDL Triggers.  It’s an automated log, stored in a database (table) that captures ddl changes (Create, Drop, Alter).   This concept is different than a traditional trigger in that it only fires for database events, not data events.  Meaning that the scope of this trigger is at the schema or the structure level, not the data or the dml level.

The value that it has provided some of my clients is simply the certainty that they know what happened in a specific environment and it saves a great deal of time when it comes to debugging, troubleshooting or otherwise trying to figure out why something that worked perfectly yesterday in the database does not work correctly today.

There are a few steps that need to be taken to make this work.  The first is to determine if you want to store this data in it’s own database; or store it in a table in an existing database.  It can be done either way.  In the code below you will see that I’ve created a stand alone db with one table in it.

CREATE DATABASE [DDL_ChangeLog]

GO

use ddl_changelog;

CREATE TABLE DDLChangeLog (

DDLChangeLogID [int] IDENTITY(1,1) NOT NULL,

eventtype nvarchar(100),

objectname nvarchar(100),

objecttype nvarchar(100),

databasename nvarchar(100),

schemaname nvarchar(100),

SystemUser nvarchar(255),

AppName nvarchar(255),

HostName nvarchar(255),

tsql nvarchar(MAX),

createddate datetime,

CONSTRAINT [PK_DDL_Changelog] PRIMARY KEY CLUSTERED

(DDLChangeLogID ASC))
The code above will simply create a database and a table.

Once you have that in place, then choose the database that you would like to track ddl changes on and create this database trigger.

 

CREATE TRIGGER [trgLogDDLEvent] ON DATABASE

FOR DDL_DATABASE_LEVEL_EVENTS

AS

DECLARE @data XML

SET @data = EVENTDATA()

IF @data.value(‘(/EVENT_INSTANCE/EventType)[1]‘, ‘nvarchar(100)’) <> ‘CREATE_STATISTICS’

INSERT INTO ddl_changelog..DDLChangeLog

(EventType, ObjectName, ObjectType, DatabaseName, SchemaName, SystemUser, AppName, HostName, tsql , createddate)

VALUES (

@data.value(‘(/EVENT_INSTANCE/EventType)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/ObjectName)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/ObjectType)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/DatabaseName)[1]‘,

                           ‘nvarchar(100)’),

@data.value(‘(/EVENT_INSTANCE/SchemaName)[1]‘,

                           ‘nvarchar(100)’),

system_user , app_name (),host_name(),

@data.value(‘(/EVENT_INSTANCE/TSQLCommand)[1]‘,

                           ‘nvarchar(max)’) ,

getdate()) ;

GO

ENABLE TRIGGER [trgLogDDLEvent] ON DATABASE

GO
So, let’s test this out.

With the database that you just created the trigger on, let’s do something simple like this:

            SELECT 1 as test into TestTable;

           SELECT * from TestTable;

           DROP TABLE TestTable;

SELECT * FROM DDL_ChangeLog..DDLChangeLog

wow!  there is data there, not just any data, but it should have returned two rows; even though there are three statements above in our test.  This is because that trigger only fired for the Create table (select into) and the drop table statements.  The trigger will not fire for the select statement.

I initially started using these for my own reasons / purposes; however as time has gone on, I’ve found that there are other audiences that LOVE knowing this can be tracked… Auditors love this functionality, QA loves to be able to verify that the blackbox they are testing has some controls / audit processes around it.  Management loves it because they can have hard data to go back to the business on in the event of an outage, mistake or whatever.  I’ve found that it’s typically best to be as transparent as possible with changes / modifications and the state of the databases that I’m responsible for.  In this manner, I’m able to better serve my clients and in turn that are more likely to believe / trust my opinion when it counts.


vConference – This week!

April 20, 2009

This is the week that the vConference will air.  SQL Server WorldWide Users Group (SSWUG) puts on the event and this year I was chosen to speak at the event.

If you have not already registered, you can do so here : https://www.vconferenceonline.com/upcoming.asp 

I have a discount code for you to use as well : SVPJLOSP09

The recording of the event was very different from an in person conference.  It took me a few minutes to get used to the cameras and not have feedback from an audience.  That said, I’m happy with my presentations even though I moved a bit too fast and I felt that I didn’t focus on some of the points and bring them back around the way I was hoping for. 

At any rate, enjoy the conference, there is no better value for the money! 

Later this week I’ll have a couple of posts up about my sessions at Dev Connections.  They will get a lot more technical and it’s my hope, be of use for you.


Partitioning

April 16, 2009

Over the past few years I have implemented partitioning a number of ways.  In the world of SQL Server 2000, it was “home-made”.  Meaning that I would create a number of tables whose schema was identical, create check constraints and then link all of the data together with views.  This method had tremendous upside, but as you can imagine, the overhead ov moving data from one “partition” to another was very time intensive and expensive from an I/O perspective.

With SQL Server 2005, Microsoft introduced partitioning as part of the relation engine… This was huge news for me specifically.  I could actually do this “Right” for my clients.  As many of you know, paritioning has some limitations in SQL Server 2005 that made a great option, but the limitation of parallelism that accompanied the feature left for a bad taste in the mouths of many.

This brings us to SQL Server 2008 where, based on my testing, the parrallism issue is truly resolved and the optimizer is actually recoqnizing the partitions as it should. 

So, what have I done?  I’ve implemented it all over the place.  That is the point of this post.  At one client, it’s worked just as I imagined that it would… at another, I’ve had a bit of stumbling block.  I did not realize the impact that non-clustered indexes would have if they weren’t created with the partition.  So what I ended up with, over time were very, very, very heavily fragmented indexes.  In fact, some of them were so bad that the optimizer was choosing different paths to the data.  (That’s B.A.D.).

I thought my fix would be a tough one.  But it turned out to be uber simple.  I dropped the indexes that existed on the primary filegroup and re-created them on the partition.  Viola, piece of cake.  Everything was awesome, until about 2 weeks later when the fragmentation of those indexes caught up with me.  This left me cursing the amount of time it was going to take to rebuild these huge indexes and wondering how often I’d have to do it…

So, I stopped trying to do it my way and went to my favorite resource.  BOL.  Of course, there is a way to do this and it’s not painful.

It’s the function $PARTITION.  If you are using partitioning in your environment and you are unaware of this function, become aware. 

What this function enabled, in a very easy and straight foward manner was for me to quickly and dynamically identify which partition was the active partition.  Once I had that information, I simply wrote an alter index statement with rebuild where the partition equaled my variable value.

Code snippet below.

Declare @PartitionID INT

Select @PartitionID  =  (SELECT $PARTITION.PartitionName (getdate()));

ALTER INDEX IndexName ON owner.TableName

REBUILD Partition = @PartitionID;

    –  Note that I can utilize the getdate function with the $Partition function.  Since my partition function (key) is a datetime value.  Simple logic could be used if it were an int.

The obvious advantage to this is that when I run the rebuild index script, it is only running for the active partition.  This brings down the amount of time required for this action and enables me to run it nightly as the data becomes very fragmented from day to day in the current partition.

So, my lesson for this day?

Easy, think ahead about what will happen down the road.  Always check BOL and test, test, test. 

Happy Partitioning!


Time

April 13, 2009

 

Over the past five or six years I’ve put a great deal of thought into starting a blog… What would I write?  Would it be relevant?  Is it a waste of time?  etc… Many of you who read this have probably asked yourselves the same questions. 

As it turns out, I don’t have a great deal of margin or spare time in my daily life.  Even so, I believe that this discipline will help clients, associates, friends and perhaps even my mom better understand what I do for a living.

This blog will be technical in nature and more specifically it will focus on data becoming realized in an organization. 

I’ve been doing independent consulting for the most part over the past 7 years.  My focus over the past 10 years has been with relation database management systems (RDBMS).  I’ve had the opportunity to work with some of the best and brightest in the industry.  My specific focus is with SQL Server, from ground up development to migrations, performance tuning and everything in between.  I’ve thoroughly enjoyed it and learned a great deal.    

Ok, so onto some content worth reading. 

Much like this blog, I’ve found that time is often hard to come by.  I don’t mean time to watch the new episode of the apprentice or time to get in a round of golf after a work day and before the sun sets, but time on your servers. 

Time to dig into your systems and provide value above and beyond your job description.

Time to analyze your environment with cutting edge technology / architecture.

Time to upgrade your skill set to take advantage of the new technology.

Time to tune your databases and identify recurring trends.

Time to think … versus reacting.

Time ….

Over the years I have learned that unlike the majority of commodities in this world, time is not a forgiving entity. 

As this blog matures, I will detail on the tools that I use and the methods employed to best manage the time that I have and have it be the most productive time spent.  I will also write about anything going on in my life as a small business owner and consultant.

I look forward to posting all things data related and humbly thank you for taking the time to read.