published: 2008/02/09. tags: web 2.0, tagging, data model, architecture

Tagging For The Enterprise Applications

Tig-Tag-Toe, fun with tags

Tagging is one of the web 2.0 concepts that became popular through websites like flickr.com and del.icio.us. The requirements of the tagging data model for such websites is some what different from that of enterprise applications. The next few sections will present the data model for tagging for simple to more complex scenarios.

Basic Tagging Requirements

Tagging invariably coexits with the concept of a tag cloud. A tag cloud is a visual representation of all the tagging information in a summary form. The summary information can be calculated on the fly or can be pre-aggregated either synchronously or asynchronously. Further, if tag cloud is provided for a dynamic set of data identified from a search query, then it has to be always computed on the fly. Hence, both on the fly computation and pre-aggregation strategies may need to be adopted.

Tagging differs from traditional categorization schemes where the list of categories are pre-defined and only a few people are allowed to categorize. Tagging is better than the traditional categorization because, the objects can be classified

  1. as and when needed
  2. by anyone
  3. using multiple categories
However, tagging can be successful only when there is less proliferation of the tags. So, in order to assist in this, when an object is being tagged, it should be guided by providing already existing tags as the user tries to type the tag so that there are no variations due to typos and tense/cardinality of a word. For a web based UI, this is typically done using AJAX.

The Data Model

tag_taggings
  object_id number not null
  tag string not null
  tagged_by number not null
This is the most simplest data model needed for tagging. And this can support all the above requirements. However, for large volumes, it is better to maintain two tables, one that contains the list of tags and the other that is the actual tagging table. In that case, the following is the data model
tag_tags
  tag_id number not null
  tag string not null
  tagged_date not null

tag_taggings
  tagging_id number not null
  object_id number not null
  tag_id number not null
  tagged_by number not null
  tagged_date date not null
I have also added the tagged_date, the date on which that particular tagging is done so that it's possible to provide some temporal metrics as well.

In enterprise applications, tagging may be applied to different types of objects. So, instead of creating one set of tagging data model for each object type, it's better to create a generic data model that supports multiple object types. This is achieved by making the following changes to the data model

tag_tags
  tag_id number not null
  tag string not null

tag_taggings
  tagging_id number not null
  object_id number not null
  object_type_id number not null
  tag_id number not null
  tagged_by number not null
  tagged_date date not null
  tag_first_by_type boolean
The purpose of tag_first_by_type is to identify the list of tags appplicable for a given object type. This is set to true for only those records where a particular tag is first applied to any object within an object type. Note that if deletion of taggings is supported, then this field needs to be maintained during such delete operations. That is, when a tagging record whose tag_first_by_type is deleted, then the value should be carried to the next tagging action with the same tag to the same object type. An alternative approach would have been to create a separate table that tracks the unique tags applicable for an object type along with the count and explicitly incrementing/decrementing the count.

Note that it is important to track the list of tags that have been applied to each object type. The reason is, when providing support for auto-completing a tag when a person is tagging, it is important to be able to show only those tags that are already applicable for that particular object type and not every tag in the system. Otherwise, a tag such as "hazmat" for object type Products will show up when one types 'h' to complete it with 'humble' for an object type of person. You don't want your potential employees to be tagged as hazmat, isn't it? So, going to this level of tracking really depends on the desired level of usability. If you don't care about every tag in the system showing up for any object type, then you don't need to bother about tracking the list of tags applicable for an object. Also, note that it is still possible to derive the list of tags applicable for an object type without having to maintain this tag_first_by_type flag at all. However, it requires selecting the distinct tags for that object type from the tag_taggings table and that could be an expensive operation and especially so for an ajax auto-completition type of operation. So, this flag is mainly to provide decent performance.

The next requirement in enterprise applications is to keep the tagging private to the user rather than sharing that information. The following change is made to the data model to support that.

tag_tags
  tag_id number not null
  tag string not null

tag_taggings
  tagging_id number not null
  object_id number not null
  object_type_id number not null
  tag_id number not null
  tagged_by number not null
  tagged_date date not null
  tag_first_by_type boolean
  is_private boolean
The is_private is set to true if a user wants his tagging action to be private.

Now that we have the data model, let's look at how the data model can be used to do the various tagging tasks (Oracle syntax is used below)

Now that the various SQLs have been identified, it would be possible to identify the necessary indexes. Below is the list of indexes If is possible to partition the tag_taggings table by the object_type_id. In that case, the above indexes that contain object_type_id as the prefix can be converted to local indexes without the object_type_id column.

If there is no requirement to support private tagging, then some of the operations like tag cloud creation can be more efficient and can be based entirely on the index on the tag_tagging table and not requiring accessing the table.

© 2008 Dirisala.Net/articles