A Pyramid of Data Models

Frameworks and Methods

The different types of data model (and metadata model) are best viewed within the context of a governing framework – they all form part of an Enterprise’s view of its ‘architecture’ (using the term very loosely here), and are generally viewed as distinct layers within the ‘data’ stream of an architecture.  For example, the Zachman Enterprise Architecture framework (see www.zachmaninternational.com) identifies 4 separate levels of data model – the Scope (contextual), Business (conceptual), System (logical), and Technology (physical) models.  This framework is supported by several of the Enterprise Architecture tools on the market, with varying degrees of compliance.  There are also a variety of development methodologies, such as Merise and Information Engineering, that have their own view.

The Pyramid

Many practitioners of data modelling use a pyramid to illustrate the different types of model that can be produced, and I make no apologies for doing the same.  This particular shape is well suited to the task for two reasons:

  • it supports the principles of ‘layers’ (you’ll see what I mean below)
  • as we move down the layers, everything grows – the number of models, their complexity, and the number of objects included

The pyramid consists of four tiers, providing increasing detail as you move towards the base.

6 thoughts on “A Pyramid of Data Models

  1. Michelle Poolet September 20, 2013 / 15:23

    Sweet, concise, and visually, quite compelling…great job!

    • George McGeachie September 20, 2013 / 17:04

      Thanks, it’s a shame most vendors of data modelling tools only focus on the bottom two layers

      • Gordon Everest September 24, 2013 / 00:45

        Yes, every tool should provide abstraction mechanisms which can be used to decrease the level of detail when presenting a data model to humans in such as way that they can comprehend it.

  2. Gordon Everest September 24, 2013 / 00:53

    When will the myth end? Consider the Pyramid of Data Models: The layers can be characterized on two different and independent dimensions (which are commingled in the Pyramid)

    – DETAIL: Moving down the levels, the models contain increasing detail, conversely, moving up results in increased abstraction (that is, removing detail; not to be confused with ‟generalization‟ which is often called abstraction, and wrongly so, IMHO).

    – SCOPE: Moving down the levels we have narrowing scope, conversely, moving up we broaden the scope. This is made quite clear with the added labels: enterprise, domain, and application on the top three layers.

    – Each level (and there could be many more) has its own physical manifestation (of the metadata), not just the bottom (physical) level. Can we, do we instantiate the models at higher levels?

    We need to distinguish the process we go through in developing data models, and the end product of that process. I submit that the end product is a fully detailed, and widest scoped (enterprise?) data model. In practice this would consist of a collection of detailed, physical data models, after all, there are always relationships across databases.

    In the process of developing a data model, we choose our scope and within that, we choose the level of detail as we move toward the end product. We can develop data models top down, or bottom up, or bounce back and forth between the two approaches. We may start the process at a ‟high‟ level, a bird‛s eye view (this seems to be the prevailing view), and gradually flesh out the detail. Alternatively, we can start with a list of interested data items, as from a ‟Data Flow Diagram (DFD),‟ and go through a process of clustering, and relating those items. In practice, we do both as we develop a data model.

    While we may characterize any given model in terms of its scope and its level of detail, neither of these characteristics implies a type of data model, as in the ‟pyramid.‟ Hence, we need to exclude both notions of scope and detail when we seek to develop a taxonomy of data models.

    I prefer to speak of a single data model (broadest scope, full detail) and its physical implementation (probably in several databases). Of course, we do not develop it all at once, but rather in a continuing series of data modeling projects. The resulting product of each data modeling project gets folded into the one, all encompassing (enterprise-wide), fully detailed data model. That model evolves over time. To be sure, that model exists in all organizations today – except that it is fragmented, and that is OK.

    We must distinguish the (underlying) data model from its presentation. Humans cannot comprehend a complete data model in all its detail (a computer can!). In fact, given the limitations of present day DBMSs, some of the important ‟details‟ are not even included in the physical, implemented model, simply because we can‛t, or perhaps we express and enforce them in code (triggers and stored procedures). So we must adopt strategies to enable people to understand a data model. One strategy is abstraction, or leaving out detail (call it ‟vertical abstraction‟). Another strategy is partitioning when we present a part of the whole model (call it horizontal abstraction). Now, you want to present an enterprise data model, take the whole thing and remove a bunch of detail. The underlying model remains unchanged. You want to present a model for a particular domain, then take that piece of the full model, which in turn can be presented at increasing levels of abstraction (decreasing detail). Abstraction and partitioning are issues of presentation, not issues of modeling.

    In fact, I would argue that the presentation (documentation) of a data model should begin at a very high level of abstraction. In practice, I have never found a data model that could not be reduced to one or two major entities – that is where you begin the presentation to humans, and then gradually unfold the detail down to the level of interest to them. For example, business users may not be interested in the physical encoding of data item values, or in which fields are indexed. At a higher level, they may not need to know where the foreign keys are since that is simply a particular way to represent relationships in a relational database. At some level it may not be important that they know the identifiers used – the model simply shows a population of a particular entity/object. We know that at some point we must find a way (a surrogate) to uniquely identify the members of the population, but that is not necessary to understand the model. And when are the integrity constraints (business rules) important?

    In conclusion, let‛s not classify data models on the basis of presentation choices. Which ever level you are at in the Pyramid, there is still one underlying, enterprise-wide (or more), fully detailed model. The implication in the Pyramid is that there are different models at each level. There are not. If we persist in using the Pyramid, lets call it what it is: layers of presentation (of models), not layers (or types) of models.

    • George McGeachie September 27, 2013 / 14:49

      Thanks for all the extra words, Gordon. You make up for my habitual lack of verbosity. I don’t know if any tool vendors have attempted to move from layers of models to layers of presentation, but I know it would be difficult to achieve without re-architecting their tools, and moving away from the model-focused view that they all have. I have seen a repository-based tool that regards entities, processes etc as independent objects, not relying on a model for context, but that still doesn’t provide the ability to create or navigate abstractions. I did overhear a discussion on the topic at EDW a few years ago, but nothing came of it.

  3. Harry January 3, 2014 / 09:11

    Having “live” transformations between various level of abstractions would make the software increasingly complex and demand high development and maintenance cost. What is available today is covering 80% of the needs with what could be 20%of the effort if attempted to support infinite levels of abstractions and flexible transformations between them.

Leave a Reply