Documentum 6.5 Content Management Foundations — Save 50%
Learn the technical fundamentals of Documentum 6.5, develop insights with illustrated examples from a real-life business scenario, and ace the E20-120 exam with this book and eBook
In this article series by Pawan Kumar, author of Documentum 6.5 Content Management Foundations, we will explore the following concepts:
- Objects and types
- Type hierarchies and type categories
- Object and content persistence
- Lightweight and shareable object types
- Querying objects
Documentum uses an object-oriented model to store information within the repository. Everything stored in the repository participates in this object model in some way. For example, a user, a document, and a folder are all represented as objects. An object store s data in its properties (also known as attributes) and has methods that can be used to interact with the object.
A content item stored in the repository has an associated object to store its metadata. Since metadata is stored in object properties, the terms metadata and properties are used interchangeably. For example, a document stored in the repository may have its title, subject, and keywords stored in the associated object. However, note that objects can exist in the repository without an associated content item. Such objects are sometimes referred to as contentless objects. For example, user objects and permission set objects do not have any associated content.
Each object property has a data type, which can be one of boolean, integer, string, double, time, or ID. A boolean value is true or false. A string value consists of text. A double value is a floating point number. A time value represents a timestamp, including dates. An ID value represents an object ID that uniquely identifi es an object in the repository. Object IDs are discussed in detail later in this article.
A property can be single-valued or repeating. Each single-valued property holds one value. For example, the object_name property of a document contains one value and it is of type string. This means that the document can only have one name. On the other hand, keywords is a repeating property and can have multiple string values. For example, a document may have object_name='LoanApp_1234567891.txt' and keywords='John Doe','application','1234567891'.
The following figure shows a visual representation of this object. Typically, only properties are shown on the object while methods are shown when needed. Furthermore, only the properties relevant to the discussion are shown. Objects will be illustrated in this manner throughout the article series:
Methods are operations that can be performed on an object. An operation often alters some properties of the object. For example, the checkout method can be used to check out an object. Checking out an object sets the r_lock_owner property with the name of the user performing the checkout. Methods are usually invoked using Documentum Foundation Classes (DFCs) programmatically, though they can be indirectly invoked using API. In general, Documentum Query Language (DQL) cannot be used to invoke arbitrary methods on objects. DQL is discussed later in this article.
Note that the term method may be used in two different contexts within Documentum. A method as a defined operation on an object type is usually invoked programmatically through DFC. There is also the concept of a method representing code that can be invoked via a job, workflow activity, or a lifecycle operation. This qualification will be made explicit when the context is not clear.
Working with objects
We used Webtop for performing various operations on documents, where the term document referred to an object with content. Some of these operations are not specific to content and apply to objects in general. For example, checkout and checkin can be performed on contentless objects as well. On the other hand, import, export, and renditions deal specifi cally with content. Talking specifically about operations on metadata, we can view, modify, and export object properties using Webtop.
Viewing and editing properties
Using Webtop, object properties can be viewed using the View | Properties menu item, shortcut P, or the right-click context menu. The following screenshot shows the properties of the example object discussed earlier. Note that the same screen can be used to modify and save the properties as well.
Multiple objects can be selected before viewing properties. In this case, a special dialog shows the common properties for the selected objects, as shown in the following figure. Any changes made on this dialog are applied to all the selected objects.
On the properties screen, single-valued properties can be edited directly while repeating properties provide a separate screen for editing through Edit links. Some properties cannot be modified by users at any time. Other properties may not be editable because object security prevents it or if the object is immutable.
Certain operations on an object mark it as immutable, which means that object properties cannot be changed. An object is marked immutable by setting r_immutable_flag to true. Content Server prevents changes to the content and metadata of an immutable object with the exception of a few special attributes that relate to the operations that are still allowed on immutable objects. For example, users can set a version label on the object, link the object to a folder, unlink it from a folder, delete it, change its lifecycle, and perform one of the lifecycle operations such as promote/demote/suspend/resume. The attributes affected by the allowed operations are allowed to be updated.
An object is marked immutable in the following situations:
- When an object is versioned or branched, it becomes an old version and is marked immutable.
- An object can be frozen which makes it immutable and imposes some other restrictions. Some virtual document operations can freeze the involved objects.
- A retention policy can make the documents under its control immutable. Certain operations such as unfreezing a document can reset the immutability flag making the object changeable again.
Metadata can be exported from repository lists, such as folder contents and search results. Property values of the objects are exported and saved as a .csv (comma-separated values) file, which can be opened in Microsoft Excel or in a text editor. Metadata export can be performed using Tools | Export to CSV menu item or the right-click context menu. Before exporting the properties, the user is able to choose the properties to export from the available ones.
Objects in a repository may represent different kinds of entities – one object may represent a workflow while another object may represent a document, for example. As a result, these objects may have different properties and methods. Every time Content Server creates an object, it needs to determine the properties and methods that the object is going to possess. This information comes from an object type (also referred to as type).
The term attribute is synonymous with property and the two are used interchangeably. It is common to use the term attribute when talking about a property name and to use property when referring to its value. We will use a dot notation to indicate that an attribute belongs to an object or a type. For example, objectA.title or dm_sysobject. object_name. This notation is succinct and unambiguous and is consistent with many programming languages.
An object type is a template for creating objects. In other words, an object is an instance of its type. A Documentum repository contains many predefined types and allows addition of new user-defined types (also known as custom types).
The most commonly used predefined object type for storing documents in the repository is dm_document. We have already seen how folders are used to organize documents. Folders are stored as objects of type dm_folder. A cabinet is a special kind of folder that does not have a parent folder and is stored as an object of type dm_cabinet. Users are represented as objects of type dm_user and a group of users is represented as an object of dm_group. Workflows use a process definition object of type dm_process, while the definition of a lifecycle is stored in an object of type dm_policy. The following figure shows some of these types:
Just like everything else in the repository, a type is also represented as an object, which holds structural information about the type. This object is of type dm_type and stores information such as the name of the type, name of its supertype, and details about the attributes in the type. The following figure shows an object of type dm_document and an object of type dm_type representing dm_document. It also indicates how the type hierarchy information is stored in the object of type dm_type.
The types present in the repository can be viewed using Documentum Administrator (DA). The following screenshot shows some attributes for the type dm_sysobject. This screen provides controls to scroll through the attributes when there are a large number of attributes present. The Info tab provides information about the type other than the attributes.
While the obvious use of a type is to define the structure and behavior of one kind of object, there is another very important utility of types. A type can be used to refer to all the objects of that type as a set. For example, queries restrict their scope by specifying a type where only the objects of that type are considered for matches. In our example scenario, the loan officer may want to search for all loan applications assigned to her. This query will be straightforward if there is an object type for loan applications. Queries are introduced later in this article.
As another example, audit events can be restricted to a particular object type resulting in only the objects of this type being audited.
Type names and property names
Each object type uses an internal type name, such as dm_document, which is used for uniquely identifying the type within queries and application code. Each type also has a label, which is a user-friendly name often used by applications for displaying information to the end users. For example, the type dm_document has the label Document.
Conventionally, internal names of predefined (defined by Documentum for Content Server or other client products) types start with dm, as described here:
- dm_: (general) represents commonly used object types such as dm_document, which is generally used for storing documents.
- dmr_: (read only) represents read-only object types such as dmr_content, which stores information about a content file.
- dmi_: (internal) represents internal object types such as dmi_workitem, which stores information about a task.
- dmc_: (client) represents object types supporting Documentum client applications. For example, dmc_calendar objects are used by Collaboration Services for holding calendar events.
Just like an object type each property also has an internal name and a label. For example, the label for property object_name is Name. There are some additional conventions for internal names for properties. These names may begin with the following prefixes:
- r_: (read only) normally indicates that the property is controlled by the Content Server and cannot be modified by users or applications. For example, r_object_id represents the unique ID for the object. On the other hand, r_version_label is an interesting property. It is a repeating property and has at least one value supplied by the Content Server while others may be supplied by users or applications.
- i_: (internal) is similar to r_ except that this property is used internally by the Content Server and normally not seen by users and applications. i_chronicle_id binds all the versions together in to a version tree and is managed by the Content Server.
- a_: (application) indicates that this property is intended to be used by applications and can be modified by applications and users. For example, the format of a document is stored in a_content_type. This property helps Webtop launch an appropriate desktop application to open a document. The other three prefixes can also be considered to imply system or non-application attributes, in general.
- _: (computed) indicates that this property is not stored in the repository and is computed by Content Server as needed. These properties are also normally read-only for applications. For example, each object has a property called _changed, which indicates whether it has been changed since it was last saved. Many of the computed properties are related to security and most are used for caching information in user sessions.
|Learn the technical fundamentals of Documentum 6.5, develop insights with illustrated examples from a real-life business scenario, and ace the E20-120 exam with this book and eBook|
eBook Price: $35.99
Book Price: $59.99
It is common for different types to be related in some way and to share attributes and methods. In true object-oriented style, Documentum allows persistent types to be organized in an inheritance-based type hierarchy. A type can have none or one supertype and it inherits all the supertype attributes as its own. The complete set of attributes belonging to a type is the union of the inherited attributes and attributes explicitly defined for that type. In this relationship, the new type is called a subtype.
A type with no supertype is called a null type.
The super and sub prefixes are based on the visual representation of this relationship where the supertype is positioned logically higher than the subtype, as shown in the following figure:
Note that supertype and subtype are relative terms. This means that when using either of these terms we refer to two types. A type can be a subtype for one type and supertype for another type at the same time. When many such relationships are visually represented together, they create a tree structure known as a type hierarchy. Readers familiar with object-oriented modeling will recognize this type hierarchy as a class-inheritance hierarchy. The following figure shows a portion of the type hierarchy for the predefined Documentum types:
dm_document is an important type since it represents a document and is one of the most commonly used types. It is an interesting type because it has no properties of its own and it inherits all its properties from dm_sysobject.
One may question the point of having a separate type without any properties of its own. Recall the comment about using a type for treating the objects of that type as a set. dm_document as a separate type enables us to refer to all the objects of this type and subtypes as a set. It can also be used for the complementary set, for example, identifying all the objects of type dm_sysobject and its subtypes which are not of the type dm_document.
r_object_id is a special property of every persistent object. It is used to uniquely identify the object and encodes some information within the property itself. It is a 16-character string value where each character is a hex (hexadecimal) digit.
The first two digits constitute a type tag representing the type of the object. For example, 09 means that the object has a type that is dm_document or its subtype – the object represents a document rather than a user, group, or something else. The next six digits represent the repository ID – a numeric identifier assigned to the repository. The last eight digits represent a unique ID within the repository and this ID is generated by the Content Server. The following figure illustrates the structure of the object ID and also shows the decimal values corresponding to the hex values.
Note that EMC assigns a unique range of repository IDs to each of its customers for the various repositories served by their Content Server installations. As long as these assigned repository IDs are used uniquely, r_object_id will uniquely identify an object across all repositories.
Object types are categorized into standard and special categories for internal management by Content Server. A type is a standard type if it is not in one of the special categories. Most of the commonly used types are standard types. The special object type categories are:
- Shareable: Shareable object types work in conjunction with lightweight object types. A single instance of a shareable type can be shared among many lightweight objects.
- Lightweight: Lightweight object types are used to minimize storage for multiple objects that share common system information. The shared properties reside in an instance of a shareable type and the rest of the properties reside in the lightweight objects. A lightweight type is a subtype of a shareable type.
- Aspect property: Aspects enable addition of properties to any object regardless of its type. Aspect property object types are used internally for managing properties associated with aspects. Users and client applications are not aware of these types.
- Data table: Data table is a collaboration feature that enables users to manage structured collection of information.
Since shareable and lightweight object types are used together they are discussed together later in this article. Aspect property types are internal types not visible to the users so they are not discussed further. However, the corresponding feature for users – aspects, is discussed later in this article. Data table object types are used by the optional Collaboration Services and are not discussed further.
Content Server explicitly identifies the category of an object type using the dm_type. type_category attribute, which can have the following values:
Standard object type
Aspect property object type
Shareable object type
Lightweight object type
Data table type
Objects stored in the repository are called persistent objects and their types are referred to as persistent types. All persistent types are part of a type hierarchy rooted in the internal type persistent object, which has the following attributes:
- r_object_id: This is used for uniquely identifying the object and is assigned by Content Server. This property has been described earlier in this article.
- i_vstamp: This is used internally for managing object updates and holds the number of committed transactions that have altered this object.
- i_is_replica: This is used in replication and determines whether an object is a replica of another in a different repository. Object replication replicates (copies) objects, both content and metadata, from a source repository to a target repository. The object copies in the target repository are known as replica objects.
Objects are stored in the repository using object-relational technology where properties are stored in (relational) database tables. Each persistent type is represented by two tables in the repository database – one for storing the single-valued properties and the other for storing the repeating properties. Single-valued properties for a type are stored in a table named type_name_s, while repeating properties are stored in a table named type_name_r.
For both single-valued and repeating properties, the property names map to the column names in the tables. Further, all of the _s and _r tables also have a column named r_object_id. The r_object_id column is used to join the single-valued and repeating properties along with the inherited properties to bring all the properties of an object together.
The structure of the _r tables is worth paying extra attention to. Each object can have multiple rows in the _r table where each column represents one repeating property. There is also an internal attribute named i_position, which defines the order of the values. Usually, two repeating properties of an object are not related to each other. For example, authors and i_folder_id are two repeating properties of dm_sysobject and there is no relation between an author and the ID of a folder that the object is linked to. Yet, these two values may be present in the same record in the table dm_sysobject_r.
While there is no requirement for two repeating attributes to be related there is no prohibition either. Indeed, various types have two or more repeating attributes that are related and correspond to one another by index position. For example, dm_policy represents a lifecycle and has several repeating attributes which correspond to each other by index position where each index represents one state. Among these attributes, state_description will hold the description for the state named in state_name, whose internal state number is stored in i_state_no.
This storage scheme lets us determine the number of records for an object in its _r table. It is equal to the maximum number of values in any of the repeating properties that is not an inherited property for the object's type.
Consider an example where a custom type dq_document has dm_document as its supertype, as shown in the following figure:
The following figure illustrates persistence for an object of dq_document.This figure only shows a small number of attributes for brevity. Note that the tables used for persisting objects of a particular type only store the properties explicitly defined for that type. Inherited properties are stored in the tables for the supertypes where they belong. However, all such persistence tables have the r_object_id column, which is used for joining information from other tables. Also note that dm_document does not add any properties of its own and dq_document does not add any repeating properties so the corresponding tables are absent from this figure. Another example of a type that doesn't use one of the tables is dm_folder, which has two repeating properties but no single-valued properties of its own.
Looking at the figure above, you may be wondering how we would know that this is an object of dq_document since the table name doesn't indicate that. In fact, these tables hold information for objects of all subtypes of dm_sysobject as well. The exact object type of an object is identified by the dm_sysobject_s.r_object_type column (not shown in the figure above).
It is useful to know how properties are stored in database tables though all the properties of an object can be queried together using DQL without any reference to these tables. Internally, Content Server uses database views that join appropriate tables to retrieve all the needed properties of the type together. Further, when multiple types are used in one DQL query the DQL parser applies appropriate table joins to achieve the intended effect.
While most of the types represent persistent objects, there are some types whose objects are used for temporarily storing information in memory. These objects are not stored in the repository and are called non-persistent objects. For example, a client config object holds the configuration parameters for sessions when a client attempts to connect to the Content Server.
|Learn the technical fundamentals of Documentum 6.5, develop insights with illustrated examples from a real-life business scenario, and ace the E20-120 exam with this book and eBook|
eBook Price: $35.99
Book Price: $59.99
About the Author :
Pawan Kumar is a Technical Architect with current expertise in Enterprise Content Management with EMC Documentum. He has an MS in Computer Science from University of North Carolina at Chapel Hill and a BS in Electrical Engineering from the Indian Institute of Technology, New Delhi (India).
Pawan has experience developing products as well as delivering business solutions on the Documentum platform and has created two products for this platform. He is intimately familiar with effective processes and tools for achieving business objectives through Documentum-based technology solutions. He has led and executed requirements and design workshops, architecture design, scoping, estimation, project planning, resource planning, technical design, software development, software testing, solution roll-out, and ongoing support for the deployed solutions. Pawan has been architecting, designing, and developing enterprise applications for ten years. He has developed software systems for financial services, healthcare, pharmaceutical, logistics, energy services, and retail industries. His expertise spans solution architecture, document management, system integration, web content management, business process management, imaging and input management, and custom application development.
Currently, Pawan provides consulting and training services through doQuent (http://doquent.com), which was founded with the vision of enabling client success in content-related business initiatives. He also believes in giving back to the community. He founded the free online Documentum community dm_cram (http://dmcram.org), which is a test preparation resource for Documentum exams. He is also an active contributor to the Documentum-users Yahoo! User group, where Documentum community members seek help for their technical challenges. He can be reached at email@example.com.