Normally I write about stuff I know about… but this blog is an exception! I am writing about this to try and teach myself something. I’m not normally a risk-taker so you’ll have to be patient if you correct me. I was reading an article, which discussed quantum mechanics, and its application to Oracle and data warehousing. This blog hopes to explore some of the points raised there, and apply the principles instead to Microsoft SQL Server 2008.
As we discussed in an earlier blog, data about facts are held in dimension tables. The ‘look up’ data is held in dimensional tables, and the actual facts themselves are held in a fact table. How can dimensional tables be configured to cope with data or ‘real life’ changes?
Dimensional modeling implements a methodology called ‘Slowly Changing Dimensions’ or SCD in order to accommodate ‘real life’ changes about facts. It is possible to correlate SCD methodology with other areas of knowledge or data representation in other philosophical realms such as quantum mechanics. There are three main types of commonly used SCD: 1, 2, and 3. Each type of SCD tries to allow the dimensional designer to cope with changes. The relationship between SCD types, quantum mechanics and related implementation in Microsoft can be found next.
The Type 1 methodology simply overwrites old data with new data. Historical data is not tracked at all, and can be used when data errors are being corrected, for example, if an attribute was mis-spelled. According to the Copenhagen interpretation of quantum mechanics, this would mean that every measurement causes a fundamental change in the state; a waveform collapse. One criticism of this rationale for SCD and the Copenhagen interpretation is that it does not allow the state change reversal to the original state. Applied to dimensional modelling, this means that the original information is lost. This destructive behaviour on the original record may not be suitable for many business environments, which may need to preserve the history of the state over time. In Microsoft SSIS, the Type 1 implementation of the Slowly Changing Dimension transformation directs these rows to an output named Changing Attributes Updates Output.
In a Type 2 methodology, in terms of data warehouses, existing records cannot be changed. However, it is possible to allow an update to show that the record is current or expired. A change in an attribute results in the creation of new records, which have a primary key. Thus, a new record is created as a result of an attribute change as opposed to the update of an existing record. Since a new record is inserted each time a change is made, then the history is preserved.
In dimensional modelling, Type 2 corresponds to the Many Worlds interpretation of quantum mechanics. It allows events to reside next to one another by catering for many possibilities or worlds, which are not correlated with one another. This is similar to the Type 2 SCD since it serves as an analogy to the situation where a new situation or record is created to correspond to the new state. The analogy breaks down, however, since the ‘Many Worlds’ interpretation holds that the scenarios are unrelated, but in dimensional modeling the records are related to one another since the records share the same Business Key, even though they have a different Surrogate Key. In terms of future ‘worlds’, future events against a dimension record will be owned by the latest dimension record that changed via the Business Key, which ties the multiple instances together; so a relation will exist.
In SSIS, the Type 2 implementation of Slowly Changing Dimension transformation directs these rows to two outputs: Historical Attribute Inserts Output and New Output. Fixed attribute changes indicate the column value must not change. The Slowly Changing Dimension transformation detects changes and can direct the rows with changes to an output named Fixed Attribute Output.
In a Type 3 methodology, this involves a combination of the previous two interpretations. The Type 3 SCD can be expanded to include Type 2 functionality. At its lowest level, it is a case of duplicating attributes to have a ‘Current’ and ‘Previous’ version, and then using logic to set values where needed. In quantum mechanics, an analogous interpretation exists which is known as Consistent Histories interpretation of quantum mechanics. This interpretation holds that various worlds or states are possible, but it does not hold that an unending variation of potential multiple worlds can be produced from these states. In other words, the only outcome that is considered is the outcome that actually took place; this can be called the only true or real outcome.
Microsoft SSIS does not support Type 3 changes easily because it involves changes to the dimension table. However, it is possible to capture the data values that are candidates for Type 3 changes by identifying columns with a fixed attribute update type.
Microsoft SSIS does permit the implementation of Type 1 and Type 2 SCD, but care needs to be taken with the implementation of Type 3 methodology. Microsoft SQL Server 2005 Service Pack 2 (SP2), via package logging, does permit the additional logging of calls to the SCD transformation, which allows increased capacity to troubleshoot connections, commands and queries.
To summarise, the SCD types don’t correlate exactly to the various quantum mechanics interpretation of events; but I learned something along the way and perhaps it might spark your interest too?