Data Display: Lessons from Florence Nightingale

Florence Nightingale is possibly the most famous Victorian after Queen Victoria herself. She is perhaps best known for her achievements in the advancement of nursing St. Thomas’ Hospital – which, in fact, is fairly near my office here in London! However, Nightingale is probably less well-known for her achievements in statistics; in fact, Nightingale was an accomplished mathematician and statistician. Nightingale is now credited with the invention of a type of polar area chart, known as a Nightingale’s Rose or Coxcomb. Here’s an example:

It is even possible that Nightingale’s success is due, in part, to her ability to get her point across by displaying data in a way that meant the message of the data was clear to the information consumers. Nightingale used the Nightingale’s Rose diagrams to display data on seasonal sources of patient mortality in the Crimean military field hospital where she cared for soldiers. She aimed these illustrations at civil servants who had no background in mathematical or statistical reasoning, in order to qualify her statements about the conditions of medical care.
It’s possible to criticise the Coxcomb on the grounds that it’s like a pie chart, and we know how troublesome those critters can be for comparison and so on. Stephen Few has written an article about the difficulties in understanding pie charts; it’s a great read, and you can get it from here. It’s easy to forget that, at the time, Nightingale’s statistical display of data was an incredibly innovative method of communicating this type of information. The results of her work in displaying the data are clear to see; her persuasiveness ensured that nursing was changed to a employ a more formal, structured education process, and to include a focus on both mind and body as part of the healing process. Further, her work had a direct impact on the care of her patients in the Crimean war.
It’s the same with any data; the point has to be made clearly. There are some great guidelines in Stephen Few’s blog and I will refer you to the master himself. As specified in a previous blog, SSAS 2008 browser just does not display the data in a user-friendly way. On the plus side, there are plenty of applications that allow you to show off your cube in a user-friendly way. Reporting Services, PerformancePoint and Report Builder are designed to do this, but some non-Microsoft goodies include Tableau, XLCubed and FractalEdge.
Take-away point: display your data well in order to say what you mean…

Chris Webb et al’s Latest Book on Expert Cubes in SSAS 2008 is now out!

Chris Webb, Marco Russo and Alberto Ferrari have produced a book on ‘Expert Cube Development in SSAS 2008’.
Personally I think it’s the book that is the ‘next step’ for SSAS developers. Now that SSAS 2008 is out, there are probably people out there who are like me, who have hands-on experience now in building cubes, read the performance guides and so on – but now it’s time to go on to the next level. For me, this book will fill this gap and I can’t wait to get my hands on it. My latest MS BI project is due to kick off and, before it’s even at planning stages, I’m already looking to sneak at least one cube in there somewhere – so hopefully the book will arrive before then, and that’s my weekend sorted.I couldn’t get it quickly enough on Amazon UK but I’ve ordered it direct from the publishers, which you can do by clicking here.
Add to Technorati Favorites

Using Intelligent Laziness to deliver data using Analysis Services 2008

I’m a big fan of intelligent laziness when it comes to Business Intelligence. Although in another arena, one famous adherent of ‘intelligent laziness’ was Napoleon Bonaparte. Napoleon was once said to have classified his soldiers into four types: the smart and ambitious, the dumb and ambitious, the dumb and lazy, and the smart and lazy group. Each soldier type designated their position in the army; so Napoleon made the smart and lazy group his generals, whilst the dumb and energetic, well, they usually ended up getting shot. Napoleon was smart: he knew that the smart and lazy group would be less likely to go ‘gung-ho’ into battle and thus lose him soldiers. On the other hand, the smart and ambitious group of soldiers were more likely to go into battle to achieve glory, and thus took more risk with the lives of footsoldiers. Further, he believed that the smart and lazy group of people be more likely to find simpler solutions rather than complex schemes, and less likely to react, taking risks.

It is easy to find examples of ‘intelligent laziness’ when it comes to Analysis Services cubes; to summarise, it is easier to do the calculations, and store them once in the cube for re-use. However, to help out users who don’t like the thought of BIDS (understandably), many data chart and analysis applications allow you the flexibility of storing these calculations at the presentation layer. For example, this week, I was having a look at Tableau reporting software, which allows you to quickly produce impressive-looking graphs such as a range of charts, heat maps and so on. I started looking at this after showing my latest cube to someone, using Visual Studio cube browser. Their response was ‘ugh, why are you showing me the cube in that?’ and I agree, the BIDS cube browser is, er, basic to say the least. Tableau, on the other hand, connected to my cube simply and allowed me to show off my Analysis Services 2008 cube using a range of graphs. This was really good since it also allowed me to analyse the cube, and I must say that I learned more about the underlying data by interacting with it, using Tableau as the interface.

Whilst Analysis Services cubes allow you to add in calculations, sometimes it is simpler for users to do it in data charting and analysis applications such as Excel, Tableau, XLCubed and so on. However, this can lead to repeated work on their behalf, since they need to copy and paste their calculations from one new workbook to another. When this happens, it is best to put the calculations in the cube, so that cube calculations can be happily re-used from one workbook to another. Further, this means that you always get the same number when you query the data; it is a total nightmare if you get different numbers for the same business criteria since the integrity of the data is immediately called into question, and the differences can be hard to explain. This, for me, is intelligent laziness because you do the work in the Analysis Services 2008 cube once, rather than repeat the work in different workbooks. However, I completely understand why users don’t like writing MDX, using BIDS; it’s not easy, and if your boss wants the numbers now, then it’s better all round just to do it quickly in Tableau, Excel and so on. If you’re interested, you could do the calculations in Excel, Tableau and so on, and trace the Analysis Services 2008 MDX queries in the Profiler whilst browsing the cube; this would allow the user to ‘copy and paste’ the resulting MDX query, and perhaps learn MDX for themselves. Chris Webb wrote a blog about this subject here, and it is a good starting place.

The take-away point here is that laziness is sometimes a good thing, if it means finding simpler solutions and not ‘wading in’ immediately before starting work on a difficult issue.

Data Mining: good data, then do the maths

Data mining can be loosely described as ‘searching for patterns in data’, but it is important to ensure that the data is properly in place before starting. It’s easy to get sidetracked with the selection of Data Mining algorithm, for example; that’s obviously a core piece, but if the data isn’t in place, then we can’t be sure that the results will be correct. This emphasis on data collection and integrity hasn’t always been around, however. It was hard to think of a data mining example where the importance of data made interesting reading! So I looked instead to the history of astronomy; Tycho Brahe (1546 – 1601) laid the foundations for today’s astronomy, by emphasising the rigorous and clean collection of data regarding the planets and the stars, which was a real innovation.

Tycho believed that astronomy could not be pursued by non-rigorous collations of astronomical data. Instead, Tycho believed that astronomy could only be understood through systematically collecting data. This also meant including redundant data, which Tycho was prepared to put the work into completing; he actually conducted this study for almost twenty years, and, amazingly, without a refracting telescope! This is an incredible achievement, since this feat occurred prior to the development of the refracting telescope (Keplerian Telescope), which was created by Kepler in 1611.

Significant to us today, however, is that Tycho produced the most accurate and systematic astronomical data of his time; he successfully managed to note the orbits of the planets to a very close degree. Tycho systematically collected the triangulated locations of the planets and stars throughout the course of the year, believing that the factual, observed data was the only way forward. Tycho produced hundreds and hundreds of statements about the location of each planet over the course of the year, for example ‘On the 15th March 1572, at 2.04am the planet Mars was 32’38” above the horizon, and 12’30 west of the pole star’.

Many people sought after Tycho’s data, and Tycho was much-sought after as a teacher, due to his data. Eventually, Johannes Kepler became Tycho’s apprentice. Kepler took Tycho’s data, and used it to generate his Laws of Interplanetary Motion. Briefly, these laws showed that the planets moved in elliptical orbits, rather than in circular orbits. Newton once described himself as ‘standing on the shoulders of giants’, and rightly so, since he used Kepler’s work to inform his work on gravity. Building on Tycho Brahe’s data, Isaac Newton (1642–1727) later deduced the fundamental mechanisms underlying the movements of planets. Newton’s Three Laws of Motion (uniform motion, force=(mass * acceleration), action-reaction), along with his Law of Universal Gravitation, therefore come directly from Tycho’s original observations.

Until mid-18th Century, the known planets were Mercury, Venus; Earth (obviously), Mars, Jupiter and Saturn. A detailed investigation of the orbit of Saturn showed that it was not an ellipse, but there was a slight deformation. This could be explained by another planet, whose gravitational pull was affecting Saturn’s ellipse. In 1740, Newton’s mathematics were used to create predictions surrounding the existence of another planet were published, and in 1781, Uranus was found, within 2 space minutes of the predicted location. Uranus’ orbit was also shown to have a deformed elliptical orbit, and again, another planet was posited. In 1821, based on predictions made by Newton’s mathematics, Neptune was discovered in 1846.

Thus, Newton’s theory, together with the appropriate accurate data in place, had deduced the existence and precise orbits of two previously-undiscovered planets. The theory didn’t fit all the facts, however; Mercury’s orbit also is not totally elliptical. Newton’s theory required the presence of another planet, located between Mercury and the Sun, in order to affect Mercury’s orbit enough to pull it slightly away from its ellipse. As an aside, this hypothetical planet was proposed to be called Vulcan, which Star Trek fans will know as the home planet of Spock. However, Mercury’s activity was explained by Einstein’s Theory of Relativity.

Central to this story, however, is Tycho’s stringent collection of data, and emphasis on rigorous data integrity. It is important to note that Tycho was an innovator in terms of his determination to collect careful observations of data; until this point, no-one had done this. Without his work and emphasis on clean data, the history of achievements in astronomy might have taken longer to achieve.

To summarise, it is not just all about finding patterns; the data has to be right in the first place. There has to be enough data for trial and test, along with a reduction in missing data, and even repeatable observations where possible. This applies not just to Data Mining, but other spheres too.