Attributes versus Relations - making the choice

hand holding a fine art paint brush

Modeling is more art than science; there are always various tradeoffs that occur. One of the more interesting ones that comes up is whether to model a given data point as an attribute of an object, or as another object that relates to it. For example, should the operating region of a process be an attribute, or a relation to a location object? How should we deal with servers that have multiple IP addresses?

There are several considerations that inform the choice.

First of all is that every time such information is represented as an object plus relation complicates the model when you are displaying the whole model as a graph. This kind of representation is common in most tools, whether freeware suites like Archi or 5 or 6-figure tools. Such a representation is useful for exploring an architecture, but every extra entity complicates things. So there is something to be said for favoring attributes over relations – when other considerations don’t override this.

There are some obvious cases for treatment of attributes – where the supposed attribute would be strictly quantitative or ordinal. Which is to say, if an attribute has values that can be expressed as a number, or a scale, then it should stay an attribute. If the possible values for an attribute are ‘high’, ‘medium’, ‘low’, or a figure between 0 and 10, then that attribute is not a good fit for a relation.

The next consideration comes from tooling and reporting (although, in my humble opinion, they are often the same – analytics being one of the primary benefits of a tool). Some tools impose a strict meta-model so that any attribute field must apply to all objects of that class. Others allow attributes to be added on an ad-hoc basis to specific object instances. This becomes important because of how to address the one-to-many situation (such as the process-region question mentioned above). However, a primary purpose of tooling is analytics – so unless a tool that allows per-instance attributes also allows reporting across per-instance attributes, such an approach becomes less useful for our discussion.

In such a case, a good rule of thumb is that when there can be multiple values for a supposed attribute and these values can be of an indeterminate number – then, this is a good candidate for a relation to another object. So, to return to our example above, regions for processes is a good candidate for relations versus attributes since a given process could be related to one or a dozen regions depending on circumstances.

The third question to ask is one of second-order relations. In other words, the attribute in question might relate to the original object, but could it relate to other objects – or is it a simple node of the primary object? In the latter case, there’s a strong argument for keeping the attribute as… an attribute.

Ultimately, modeling anything is still more art than science – and may even remain so. This is just as true for the choice between whether to model certain information as attributes or relations. Models are, at their heart, an attempt to gain greater understanding of a highly complex situation. However, by applying the rules that I’ve outlined in this post, you can approximate to an optimal approach.