Subject Areas – an art form for data modellers and data governance? #PowerDesigner #datagovernance #ERStudio

(this article was originally published in 2011 – I’ve made a few minor amendments today, nothing to change the message)

Time spent using colours, font styles etc. can definitely increase both the clarity and usefulness of a model.  If that use of colours and styles can be automated, even better.  Previously, I’ve used macros in a data modelling tool to define styles for entity symbols that depend on the business owner, and font styles that vary according to when the data (master data in this case) is expected to be available.  I also colour-coded relationships to denote the business area responsible for managing them, which is not always obvious. This information was all held as properties on the entity and relationship, making the macro pretty straight forward.

You can also use colours and styles to highlight entities or tables affected by a given release or change request; again, this is possibly metadata that is available for a macro to query.

One great use of colour-coding and styles is to categorise entities on a diagram, using colours to denote the owning subject area.

I recently examined a data model where all the entities were categorised into a number of subject areas, and there was a separate ERD for each subject area.  Unfortunately, the model suffered from a complete lack of artwork.  There was no colour coding, so I couldn’t tell which subject area any of the entities belonged to (though of course I could guess some of them), so I couldn’t be sure which ERD I needed to look at to see the full context of a given entity.  I had to use the main model ERD to be sure that I was seeing the full picture for an entity; this showed all the attributes for every entity, and made no use of styles or colour whatsoever; it was a difficult diagram to work with.  Thankfully, it did fit onto a single sheet of A3 paper, as the model was quite small.  In this data model, the subject areas were virtually unusable and irrelevant, because they weren’t being communicated at all.

When creating subject areas, think carefully about why you need them, and how users of the model should interpret  them.  You could break the model down into data-related subject areas, functional areas, system of record, etc.  With a really large model, you may be able to justify having multiple sets of subject areas for different purposes.

Take the example of subject areas based upon a higher-level data model; let’s say that the higher-level model includes 60 concepts, including:

‘Abstract Geography’ includes 15 entities, including ‘Company Location’, ‘Country’, and ‘City’. ‘Exploration’ includes 20 entities, including ‘Well’ and ‘Field’.  Both ‘Well and ‘Field’ will have relationships to entities within ‘Abstract Geography’.  Here’s part of the complete LDM:

Note that the colours tell you which subject area ‘owns’ each entity, and also which subject area is responsible for populating relationships.  The diagrams were created in SAP PowerDesigner, which uses the triangle symbol on the relationship line to indicate a dependent entity. here’s the same diagram in Idera ER/Studio Data Architect

Subject Areas as art - ERStudio

Assume I have separate subject area ERDs for my concepts; where can I be sure of seeing all the relationships between the ‘Abstract Geography’ and ‘Exploration’ entities?  I would expect to see all of them on BOTH subject area ERDs.  It would even be possible to code a macro that tells you which entities should be on a given subject area ERD; again based on the metadata in the model.

Is any of this art?  No, but it is a combination of graphic design, common sense, and ergonomics, with a tasty dash of automation to make it more palatable.

3 thoughts on “Subject Areas – an art form for data modellers and data governance? #PowerDesigner #datagovernance #ERStudio

  1. John Owens Dunedin March 17, 2011 / 22:42

    Hi George

    The two concepts that you address here, levels of hierarchy and subject areas, are laced with misunderstanding and fraught with danger.

    The idea of having “high level” and “low level” data models should be avoided at all costs as it suggests an hierarchal structure, which ERDs do not have. The very power of ERDs is that they are NOT hierarchal in structure, but networks.

    Also no data entity belongs to a single subject area.. Subject areas are not absolute, they are merely a useful device for grouping data entities for a particular purpose.

    For example, you mights want to display on the ERD all of the entities that are included in the area of Invoicing or you might want to look at the entities that are included in the area of Customer Relationship Management. There would be a major overlap in the entities that would appear in both these “subject areas”.

    The use of the CRUD Matrix is a very powerful mechanism for getting these onto any particular diagram. This would also be very easy to automate.

    One other way would be to build a meta model assigning entities to subject areas – this would be a many-to-many structure. This is less desirable than using the CRUD matrix as it requires manual maintenance and errors can, and do, occur.


    • George McGeachie March 18, 2011 / 10:31

      Thanks for the response, John.
      Forgive me, but I obviously didn’t make it clear that I’m not advocating a hierarchical structure WITHIN an ERD. My example is based on a real-life example, where an enterprise data model described data without going into very much detail. So there was a concept called ‘Exploration’, for example, which was itself broken down into a few more specific concepts, again without the detail you’d expect to see in a LDM. There was a more detailed conceptual data model, telling us more about the master data required to support each concept, and also separate application LDMs. We used colour-coding on all our ERDs to communicate information, in this case the ‘owning’ concept. So, there was a three-level hierarchy of data models, and they suited our purpose at the time. On this occasion, we used the font style used in entity names in the conceptual data model to indicate the status of the master data entities – a macro read the metadata in the entity to set the colour and font style. Other tools allow you to define stereotypes with associated styles. In my blog posting, I simplified the situation, and used it to illustrate the use of colour to communicate information about the entities.

      There are definitely multiple reasons for creating subsets of a data model, like ‘all the entities supporting process X’, ‘ all the data in application Y’, or ‘all the entities changed this year’, which of course means that any given entity can appear in many subsets of the model. I don’t use the term ‘subject area’ to refer to these model subsets.
      When I use the term ‘subject area’, I’m specifically referring to partitioning a data model to facilitate model management. Once determined, these subject areas are usually pretty well fixed for the life of the model. Of course, each subject area has its own ERD. If the complete LDM is a large one, we might never show anyone the full LDM ERD.

      For example, a Logistics LDM may have subject areas for ‘Shipment’, ‘Customs and compliance‘, etc. These may be derived from functional analysis, data analysis, process modelling, whatever makes the most sense in the organisation. We have to be careful how we construct these subject area ERDs, to make sure that they are interpreted correctly, and we spend the optimal amount of time maintaining them. If I’m looking at the ‘Shipment’ subject area ERD, I expect it to include all the entities needed to describe shipments, with all their relationships. I wouldn’t expect it to contain any entities not needed to describe shipments. In the example I’ve used, I would expect the ‘Exploration’ ERD to include the ‘Country’ and ‘Company Location’ entities, so I can see the complete picture. If the ‘Exploration’ ERD only contained the ‘Exploration’ entities, and I printed it for review, the reviewer would be at a disadvantage; they’d see that ‘Well’ had a foreign key from ‘Company Location’, but they wouldn’t be sure which entity it came from, nor would they see any relationship role names to tell them why it’s there. If the child entity for a missing relationship was not in the ERD, they would have no way of knowing that it existed, without checking some other documentation.

Leave a Reply