Tagging is one of the web 2.0 concepts that became popular through websites like flickr.com and del.icio.us. The requirements of the tagging data model for such websites is some what different from that of enterprise applications. The next few sections will present the data model for tagging for simple to more complex scenarios.
Basic Tagging Requirements
Tagging differs from traditional categorization schemes where the list of categories are pre-defined and only a few people are allowed to categorize. Tagging is better than the traditional categorization because, the objects can be classified
The Data Model
tag_taggings object_id number not null tag string not null tagged_by number not nullThis is the most simplest data model needed for tagging. And this can support all the above requirements. However, for large volumes, it is better to maintain two tables, one that contains the list of tags and the other that is the actual tagging table. In that case, the following is the data model
tag_tags tag_id number not null tag string not null tagged_date not null tag_taggings tagging_id number not null object_id number not null tag_id number not null tagged_by number not null tagged_date date not nullI have also added the tagged_date, the date on which that particular tagging is done so that it's possible to provide some temporal metrics as well.
In enterprise applications, tagging may be applied to different types of objects. So, instead of creating one set of tagging data model for each object type, it's better to create a generic data model that supports multiple object types. This is achieved by making the following changes to the data model
tag_tags tag_id number not null tag string not null tag_taggings tagging_id number not null object_id number not null object_type_id number not null tag_id number not null tagged_by number not null tagged_date date not null tag_first_by_type booleanThe purpose of tag_first_by_type is to identify the list of tags appplicable for a given object type. This is set to true for only those records where a particular tag is first applied to any object within an object type. Note that if deletion of taggings is supported, then this field needs to be maintained during such delete operations. That is, when a tagging record whose tag_first_by_type is deleted, then the value should be carried to the next tagging action with the same tag to the same object type. An alternative approach would have been to create a separate table that tracks the unique tags applicable for an object type along with the count and explicitly incrementing/decrementing the count.
Note that it is important to track the list of tags that have been applied to each object type. The reason is, when providing support for auto-completing a tag when a person is tagging, it is important to be able to show only those tags that are already applicable for that particular object type and not every tag in the system. Otherwise, a tag such as "hazmat" for object type Products will show up when one types 'h' to complete it with 'humble' for an object type of person. You don't want your potential employees to be tagged as hazmat, isn't it? So, going to this level of tracking really depends on the desired level of usability. If you don't care about every tag in the system showing up for any object type, then you don't need to bother about tracking the list of tags applicable for an object. Also, note that it is still possible to derive the list of tags applicable for an object type without having to maintain this tag_first_by_type flag at all. However, it requires selecting the distinct tags for that object type from the tag_taggings table and that could be an expensive operation and especially so for an ajax auto-completition type of operation. So, this flag is mainly to provide decent performance.
The next requirement in enterprise applications is to keep the tagging private to the user rather than sharing that information. The following change is made to the data model to support that.
tag_tags tag_id number not null tag string not null tag_taggings tagging_id number not null object_id number not null object_type_id number not null tag_id number not null tagged_by number not null tagged_date date not null tag_first_by_type boolean is_private booleanThe is_private is set to true if a user wants his tagging action to be private.
Now that we have the data model, let's look at how the data model can be used to do the various tagging tasks (Oracle syntax is used below)
select 1 from tag_taggings where object_type_id = ? and tag_id = ? and rownum < 2;If there is a result, then tag_first_by_type is false, else it is true. With all the available data, simply insert into tag_taggings. Note that if you want to be careful about preventing a user to tag with the same tag again for the same object, then that check is also required.
select 1 from tag_taggings where object_type_id = ? and object_id = ? and tag_id = ? and tagged_by = ?
select t.tag from tag_tags t ,tag_taggings tg where t.tag_id = tg.tag_id and tg.object_type_id = ? and tg.tag_first_by_type = 'Y' and t.tag like ?
select t.tag,count(1) from tag_tags t ,tag_taggings tg where t.tag_id = tg.tag_id and tg.object_type_id = ? and tg.object_id = ? and (nvl(is_private,'N') = 'N' or tagged_by = ?) group by t.tag
select tg.tag_count,o.* from object o ,(select tg.object_id,count(1) tag_count,max(tagged_date) tagged_date from tag_taggings tg ,tag_tags t where t.tag = ? and tg.tag_id = t.tag_id and tg.object_type_id = ? and (nvl(is_private,'N') = 'N' or tagged_by = ?) group by tg.object_id) tg where o.object_id = tg.object_idNote that in addition to getting the list of objects, the count of the number of times the tag has been applied for that object has also been fetched. Plotting this as a cloud would also give what I would call as tag object cloud. What this gives is, for a given tag, what is the object that is most related to that tag. For example, if you use tagging for the customer entity, it's possible to visualize the most likely up-sellable customers within a tag called 'upsellable'.
It is also possible to list the objects for a tag based on the most recently tagged rather than the number of times tagged. In that case, the tag_date using the max can be used to sort the order.
select t.tag,count(1) from tag_tags t ,tag_taggings tg where t.tag_id = tg.tag_id and tg.object_type_id = ? and (nvl(is_private,'N') = 'N' or tagged_by = ?) group by t.tag
select t.tag,count(1) from tag_tags t ,tag_taggings tg where (nvl(is_private,'N') = 'N' or tagged_by = ?) group by t.tagIn another type, only the object associations are counted and not the tagging actions.
select t.tag,count(distinct tg.object_id) from tag_tags t ,tag_taggings tg where (nvl(is_private,'N') = 'N' or tagged_by = ?) group by t.tagThis type of cloud helps the users identify the frequently tagged objects which likely indicates that overall these objects are most used in the enterprise transactions
The third type of tag cloud is a tag cloud by the taggers.
select t.tag,count(distinct tagged_by) from tag_tags t ,tag_taggings tg where (nvl(is_private,'N') = 'N' or tagged_by = ?) group by t.tagThis type of cloud helps understand who within the enterprise are actively involved in tagging.
Note that it's possible to have some of these clouds displayed for a given object type instead of globally as it would help study the same metrics for individual object types.
If there is no requirement to support private tagging, then some of the operations like tag cloud creation can be more efficient and can be based entirely on the index on the tag_tagging table and not requiring accessing the table.