What data should we import?


One of the perpetual aggravations for a technical person is being asked to do something simply because it is something. I've had to create documents that no-one will read, and attend meetings that I added nothing to. The worst example was where I had to give up a day of my life for a meeting with the explicit purpose of 'well, it's just a case of getting bums on seats, really'. Alas, I sometimes see the same behavior from organizations implementing tools.

                Summary: Pretty much all modeling tools today have facilities for importing information, both by importing existing diagrams in Visio and other formats, and data import capabilities. However, just because a tool makes this possible, it does not necessarily mean that you should take advantage of this capability. Issues such as data quality, data freshness and the actual effort involved come in to play. The deciding factor should be; will the information being imported be used for something, or is this simply importing data for the sake of having data?

Let's talk about legacy data first of all. Generally this is in the form of diagrams in Visio or PowerPoint, list in Excel, sometimes, Word documents. Now, while importing such documents might seem a way to leverage information that the organization has invested time into, that's only guaranteed if the information is both up-to-date and immediately understandable to the person importing it. Is this the case? Probably not.

Information that is out of date, even if it's only partially, has obvious problems. Taken as is, it presents a misleading picture. Well, perhaps we can just clean out the outdated information? Unfortunately, that presupposes that a) there's a good picture of what information is outdated, and b) that this doesn't make the rest of the model misleading. In practice, bringing in a legacy diagram generally means one of two things – a cleanup for each diagram as it’s imported, or quarantining the information somehow in some kind of 'historical reference' section.

The best way to make a decision here, on whether an import is worth is to ask Kipling's Six Serving Men – also known as the 'W's. Who will use this diagram? When will they use it? What for? Why will it be useful? And so on.

There are also issues, albeit different ones, with importing from other repositories. First of all, there are the natural costs of setting up the import – defining XML transformations, writing Excel macros or tool macros, configuring the imports – essentially setting up the whole ETL (Extract, Transform and Load) that is required. But these are merely mechanical issues.

A wider issue is one of information format and scope. To put it simply, what will you import? Everything in the other repository? In practice, this is probably a bad idea. An enterprise architecture tool has no need to know the IOS level for every switch on the network, for example.

A second issue is shared with legacy diagrams – how accurate and up to date is the information? In an ideal world, any master repository, such as a CMDB, is completely accurate and up to date, but in the real world, this is not that common.

Again, it becomes hard to identify the cleavage points to decide on how to address these issues – and again, the best approach is to refer to the six serving men. What decisions will depend on the imported data. How will you resolve conflicts between the two repositories?

There's an old joke about the politician's syllogism: “Something must be done; this is something; therefore we must do it.” In the same way, the attractive idea of reusing legacy information and importing from other repositories can actually be a waste of time – unless you check your 'W's.