Data Architecture | Ontology Development

In software-development & data architecture nirvana, the business analysts, database technologists, and application developers all speak the same language.  Everyone agrees about what each user story means.  Everyone knows what’s in each database table and column, just by looking at them.  The source code practically explains itself.  Nobody creates database tables that never get used.  Nobody writes orphaned code.

Sound too good to be true?  Not really.  It’s not even that hard.  To do it, you just need to add two documents and a few straightforward steps to your agile/scrum development process.  Here’s how.

Have your data architect maintain an application ontology.

The first new document is an application ontology, a catalog of the entity types (classes of things of interest) that appear in your user stories.  Each entity type has its own entry in the ontology.  The entry specifies the entity type’s

  • official name,
  • description,
  • attributes (properties), and
  • (logical) relations with other entity types.

The entry may also indicate whether the entity type is a business event.

For each attribute, the entry specifies the attribute’s

  • official name,
  • data type,
  • nullability (whether the attribute can have a null, i.e. unknown, value), and
  • set or list of allowed values.

An attribute must be atomic; it must have a single value of an elementary data type, such as an integer, decimal number, or string.

Finally, the entry specifies the cardinality of each relation (how many of each entity type may or must participate in the relation), and whether the relation is mandatory or optional.

Suppose for example that your team is building an application tracking over-the-counter sales for a retailer.  The application’s ontology would have an entry for the line-item entity type, like this:

Name: Line item
Description:line item represents the total quantity of a given product type sold in a given (parent) sales transaction.
Business Event:No
Attributes: quantity (positive integer, non-null)list price in USD (positive decimal number, non-null)total discount percentage (non-negative decimal number, nullable)extension in USD (positive decimal number, non-null)
Relations: One line item refers to exactly one product type.
One line item may apply several discount types.

You can maintain an application ontology as a Word document, a spreadsheet, a wiki, or using an ontology editor such as Protégé (  Each has its advantages.

Have your business analysts maintain a traceability matrix that relates user stories and entity types.

This kind of traceability matrix has one row for each user story, and one column for each entity type.  The value at the intersection of row i and column j indicates whether user story i requires entity type j.

Have your business analysts review each user story to make sure it’s expressed in terms of entity types.

All of the (pro)noun phrases and verb phrases in a user story should be names of entity types.  When this is true, anyone in the project team who is familiar with the ontology should be able to read the user story and know exactly what it means, and exactly what it requires.  The story is unambiguous.

Your business analysts will generally need to make three kinds of changes while reviewing a user story.

  1. Replace synonyms for entity-type names with the official entity-type names.  (That is, eliminate all synonyms for entity-type names.)
  2. Identify terms that represent classes of things of interest which the ontology does not yet identify as entity types, and have the data architect add those entity types to the ontology.
  3. Identify language that requires adding attributes or relations to an existing entity type, and have the data architect add them to the entity type in the ontology.

For example, suppose a user story for the next iteration says

When the cashier rings up two separate charges for the same item, the system merges the two charges into a single line item.

A business analyst might review the story and realize that the terms in red are problematic.  First, the story uses ‘charge’ as a synonym for ‘line item’, and ‘item’ as a synonym for ‘product type’, so the analyst replaces the synonyms with the official names:

When the cashier rings up two separate line items for the same product type, the system merges the two line items into a single line item.

Second, the business analyst realizes that the merge operation is a business event that does not yet appear as an entity type in the ontology.  The business analyst asks the data architect to add the entity type to the ontology.  They agree to make ‘merge’ the entity type’s official name, so the term’s meaning is now clear.

When the cashier rings up two separate line items for the same product type, the system merges the two line items into a single line item.

Finally, the business analyst realizes that the story is vague about whether the separate line items are part of the same sales transaction, and that the line-item entity type needs an additional relation to connect the line item to its parent sales transaction. The business analyst points this out to the data architect, who adds the relation, ‘Many line items may belong to one sales transaction.’ to the line-item entity type’s list of relations.  The business analyst also includes the relation in the story, to clarify that the ring-up operation groups a set of line items into a single sales transaction:

When the cashier rings up two separate line items for the same product type in the same sales transaction, the system merges the two line items into a single line item.

The business analyst can now parse the sentence into entity types, to verify that the sentence is unambiguous:

When the [cashier] [rings up] two separate [line items] for the same [product type] in the same [sales transaction], the [system] [merges] the two [line items] into a single [line item].

All of the phrases in square brackets are entity-type names.  They constitute all of the sentence’s (pro)noun phrases and verb phrases, so the sentence is unambiguous.  Anyone on the team familiar with the ontology’s contents would understand the sentence the same way, without difficulty.

Have your business analysts and data architect disambiguate user stories and develop the ontology and physical data layer one iteration ahead of the application developers.

The business analysts should disambiguate user stories in the above fashion, one iteration ahead of the development team’s coding work.  Likewise, the data architect (working with the business analysts) should keep the ontology and data layer one iteration ahead of the application developers.  (The traceability matrix makes it simple to figure out which entity types the next iteration’s user stories require.)  This rule is consistent with the agile idea of limiting detailed planning to the next iteration.  Expressed in lean terms, the business analysts are disambiguating user stories just in time.  Just in time story disambiguation and data architecture avoids putting energy into user stories and related data-architecture artifacts that may change before the developers implement them.

Have your database technologists and application developers agree on ontology-driven data architecture nomenclature standards.

The standards should specify how entity-type names, attribute names, and relation names are converted into table names, column names, class names, variable names, etc.  The conversion rules should emphasize using entire words where possible.  Likewise, the technologists should agree on a set of rules mapping attribute data types to data-layer data types and programming-language data types.  These rules should ensure that conversions between the data layer and the code layer are lossless, unambiguous, and computationally inexpensive.

When the data architect and developers follow these rules, the data layer and code they produce become self-documenting, given the ontology.  Everyone speaks the same language, and the team avoids fruitless disagreements about what are ultimately development conventions.


In sum, every kind of team member lets the ontology drive certain aspects of the development process:

  • The business analysts use the ontology to disambiguate user stories and maintain a user-story/entity-type traceability matrix.
  • The data architect maintains the ontology and uses it and related standards to simplify nomenclature and data-typing decisions in the physical data layer.
  • The developers use the ontology and related standards to simplify nomenclature and data-typing decisions in the application source code.

Ontology-driven development keeps the resolution of uncertainties about functional requirements ahead of coding activities.  It makes requirements transparent to data technologists and developers.  It makes the data architecture layer transparent to developers, and it eliminates the need to make numerous individual nomenclature decisions.  As a result, communication friction is minimized.  Developers don’t waste time writing code that doesn’t satisfy intended functional requirements, or struggling to understand requirements or data-layer artifacts.  Business analysts avoid formalizing requirements that end up not getting used, without bottlenecking developers’ work by delivering requirements late.  The data architect similarly avoids developing data-layer artifacts that get too far ahead of development, without bottlenecking development.  In these ways and more, ontology-driven development avoids waste, and so is lean development.

Read more about data architecture here!


Leave a Reply