Alfresco Developer Guide

5 (1 reviews total)
By Jeff Potts
    Advance your knowledge in tech with a Packt subscription

  • Instant online access to over 7,500+ books and videos
  • Constantly updated with 100+ new titles each month
  • Breadth and depth in over 1,000+ technologies
  1. The Alfresco Platform

About this book

Alfresco is an open source platform for Enterprise Content Management (ECM) solutions. ECM includes things like Document Management, Web Content Management, Collaboration/Enterprise 2.0, Digital Asset Management, Records Management, and Imaging. At its core is a repository for rich content like documents, web assets, XML, and multimedia. The repository is surrounded by a services layer (supporting both SOAP and REST) that makes getting content into and out of the repository a breeze, which is why so many next generation Internet solutions are built on Alfresco.

Implementing Alfresco usually involves extending the repository to accommodate your business-specific metadata and business logic. These extensions are done using some combination of Java, JavaScript, XML, and FreeMarker.
This book takes you through a set of exercises as if you were rolling out and customizing the platform for a fictional organization called SomeCo, which wants to roll out Alfresco enterprise-wide. Each department has a set of requirements that need addressed. We will show you how to extend Alfresco to meet these requirements. By the time you've worked through the entire book, you will be familiar with the entire platform. You'll be prepared to make your own customizations whether they are part of a Document Management solution, a web site that uses Alfresco for content storage, or an entire custom application built on Alfresco's REST API. This book will give you the knowledge and confidence you need to make Alfresco do what you need it to do

Publication date:
October 2008
Publisher
Packt
Pages
556
ISBN
9781847193117

 

Chapter 1. The Alfresco Platform

This chapter introduces the Alfresco platform and answers the question, "What can I do with this thing?" A few examples will be provided to help answer this question from the "solving business problems" perspective. The chapter then skims over basic configuration and customization before introducing the advanced customization concepts covered throughout the book. The chapter concludes with a brief discussion on the different Alfresco editions that are available.

In this chapter, we will go through the following points:

  • Examples of practical solutions built on Alfresco

  • High-level components of the Alfresco platform

  • Examples of the types of customizations that you will likely perform as a part of your implementation

  • Technologies you will use to extend the platform

 

Alfresco in the Real World


Alfresco will tell you that the product is a platform for Enterprise Content Management (ECM). But ECM is a somewhat nebulous and nefarious term. What does it really mean? It depends on who is saying it. ECM vendors usually use it as an umbrella term to describe a collection of content-centric technologies that includes:

  • Document Management (DM): Capturing, organizing, and sharing binary files. These files are typically produced from office productivity software, but the scope of the files being managed is unlimited.

  • Web Content Management (WCM): Managing files and content specifically intended to be delivered to the Web. The key theme of WCM is to reduce the "web developer" bottleneck and empower non-technical content owners to publish their own content.

  • Digital Asset Management (DAM): Managing graphics, video, and audio. You can think of this as DM with added functionality specific to the needs of working with rich media such as thumbnailing, transcoding, and editing. Like WCM, the intent is to streamline the production process.

  • Records Management (RM): Managing content as a legal record. Like DAM, RM starts with DM and adds functionality specific to the world of RM such as retention policies, records plans, and audit trails.

  • Imaging: This includes capturing, tagging, and routing images of documents from scanners.

Most people will also include Collaboration, Search, and occasionally, Portals as well.

Practitioners have a different perspective. They will say that ECM is less about the technology and more about how you capture, organize, and share information across the entire enterprise. For them, the "how" is more important than the "what".

What's important to know from an Alfresco perspective is that Alfresco is a platform for doing all these things.

So rather than worrying about a concise definition of ECM, let's look at a few examples to illustrate how clients are using Alfresco today, particularly in Alfresco's sweet spots such as Document Management and Web Content Management.

Basic Document Management

Alfresco started its life as a document management repository with some basic services for document management. Alfresco focused on this smart area initially for two reasons. First, it allowed Alfresco to establish a strong foundation and then build upon that foundation by expanding into other areas of ECM, with WCM being the prime example. Second, there is a huge market for systems that can manage unstructured content (aka "documents"). The market is so big because document management is a problem for everyone. All companies generate files that benefit from the kind of features document management provides such as check-in/check-out, versioning, metadata, security, full-text search, and workflow.

Examples of classic document management are often found in manufacturing, packaged goods, or other companies with large research and development divisions. As you can imagine, companies such as these deal with thousands of documents every day. The documents are in a variety of formats and languages, and are created and leveraged by many different types of stakeholders from various parts of the company.

The critical functionality required for basic document management includes things such as:

  • Easy integration with authoring tools: If users can't get documents into and out of the repository easily, user adoption will suffer. This means users must be able to open and save documents to the repository from applications such as Microsoft Office, Microsoft Windows Explorer, and email.

  • Security: Many documents, particularly legal documents and anything around new product development, are very sensitive. Employees must be able to log in with their normal username and password, and see only the documents they have access to.

  • Library services: This is a grouping of foundational document management functionality that includes check-in/check-out, versioning, metadata, and search. The ability to offer these library services is one of the things that sets a document repository apart from a plain file system.

  • Workflow: Quite literally, workflow describes the "flow of work" or business process related to a document. Requirements vary widely in this area and not everyone will leverage workflows right away. Workflows can be used to streamline and automate manual business processes by letting the document management system keep track of who needs to do what to a document at any particular time.

  • Scalability/Reliability: The system needs to scale in order to support several hundred or more users and hundreds of thousands or even millions of documents with some percentage of growth each year. Because the repository holds content that's critical to the business, it needs to be highly available.

  • Customizable user interface: The out of the box Alfresco web client is made for generic document management, which may be appropriate in many cases. Most clients will want to make at least some customizations to the web client to help increase productivity and improve user adoption.

The following diagram shows an example of high-level architecture to understand how basic document management might be implemented:

The diagram shows a single instance of Alfresco authenticating against LDAP. Some content managers are using the web client via HTTP/S, while others are using Windows Explorer, Microsoft Office, and other Thick Clients to work with content via one or more protocols such as CIFS, WebDAV, FTP, or SMTP. As noted in the diagram, Alfresco stores metadata in a Relational DB and the actual content files on the file system.

Most of the techniques for customizing Alfresco for DM solutions apply to other ECM solutions such as WCM, RM, Imaging, and DAM. Of course, there are business concepts and technical implementation details specific to each that make them unique, but the details provided in this book apply to all because the specialized solutions are built as extensions to the core Alfresco repository. WCM is built on the core repository as well, but the functionality it adds is significant enough to warrant a closer look.

Web Content Management

On the surface, WCM is very similar to document management. In both cases, content owners store files in a repository. Often, the content is assigned metadata, is secured, is indexed for search, and is routed through a workflow. The most obvious difference between DM and WCM is that the content being managed is meant specifically to be published on a web site or as part of a web application. Beyond that high-level distinction, there are several other differences that make WCM worthy of separate discussion. These include:

  • Content authoring tools used to create content

  • Separation of presentation from content

  • Systematic publication or deployment of content

Let's briefly look at each of these.

Content Authoring Tools

The majority of document management solutions deal with files generated by an office suite. Of course, there are exceptions such as various types of graphics files, CAD/CAM drawing formats, and other specialized tools. But mostly, the files are generated by a small number of different tools and an even smaller number of different software vendors.

In the case of WCM, there is a wide variety of tools involved from text editors to Integrated Development Environments (IDEs) to graphics programs with multiple vendors in each category. This means the WCM solution needs to be very flexible in the way it integrates with authoring tools. The alternative, which is forcing authors to give up their favorite tools in favor of a standard, can be a management nightmare.

Separation of Presentation from Content

WCM does not require the separation between content's appearance on the web site and its storage. But many implementations take advantage of this principle because it makes redesigning the site easier, facilitates multi-channel publishing, and enables people to author content without web skills.

To understand why this is so, think about a web site that has its content and presentation of that content merged together. When it is time to redesign the site, you have to touch every single web page because every page contains presentation markup. Similarly, content authoring is limited to people with technical skills. Otherwise, there is a risk that the content owner (for example, the person writing a press release or a job posting) will inadvertently clobber the page design.

One way to address this is to separate the content (the press release copy) from the presentation of that content. A common way to do that is to store the content as presentation-independent XML. The XML can then be transformed into any presentation that's needed. A redesign is as simple as changing the presentation in a single place, and then regenerating all of the pages.

The impact of separating content from presentation is three-fold. First, assuming the content consumers aren't interested in reading raw XML, something has to be responsible for transforming the content. Depending on the implementation, it may be up to the WCM system or a frontend web application.

Second, in the case of static content, any change in the underlying content has to trigger a transformation so that the presentation will be up-to-date, keeping in mind that there may be more than one file affected by the change. For example, data from a job posting appears in the job posting detail as well as the list of job postings. If the posting and the job posting index are both static, the list has to be regenerated whenever the job posting changes.

Third, content authors lose the benefit of WYSIWYG (What You See Is What You Get) content authoring because the content doesn't immediately look the way it will as soon as it is published to the web site. The WCM system, then, has to be able to let content authors "preview" the content as they author it, preferably in the context of the entire site.

Systematic Publication or Deployment

A Document Management system is a lot like a relational database in the sense that it is typically an authoritative, centralized repository. There are exceptions, but for the most part, content resides in the repository and is retrieved by the systems and applications that need it. On the other hand, a WCM system often faces a publication or deployment challenge. Files go into the repository, but must be delivered somewhere to be consumed. This might happen on a schedule, at the request of a user, as part of a workflow, or all of the above. Granted, some web sites retrieve their content dynamically; but most sites have at least a subset of content that should be statically delivered to a web server.

Alfresco WCM Example

Let's look at an example of a basic corporate web site. Most companies have a mix of "About Us" content that probably doesn't change very often, press releases or "News" that might get updated daily, and maybe some document-based content such as marketing slicks, product information sheets, technical specifications, and so on. There's also some content that is used to build the site such as HTML, XML, JavaScript, Flash, CSS, and image files.

It is likely that there are several different teams with several different skill sets, all collaborating to produce the site. In this example, suppose the "About Us" and "News pages" come from the marketing team, the site is built by the web team and the document-based content can come from many organizations within the company.

Alfresco WCM sits on top of the core Alfresco product to provide additional WCM-specific functionality. An important distinction between Alfresco WCM and other open source Content Management Systems is that Alfresco is a "de-coupled" CMS while something such as Drupal is a "coupled" CMS. This means that Alfresco manages the web site but does not concern itself with presentation unlike Drupal, which is both a repository and a presentation framework. This doesn't mean that Alfresco can only manage static sites. You can easily query the repository in any number of ways. It just means it is up to you to provide the frontend from the ground up.

Using Alfresco, the WCM implementation for this example might look like this:

Note that in the diagram there is a mix of structured content (XML) and unstructured content (CSS, PNG, and PDF). The structured content gets created through Alfresco web forms and is transformed to one or more formats (in this case, JSP) using XSLT or FreeMarker. The unstructured content is simply uploaded via either the web client or CIFS.

Regardless of whether it is created with a web form or uploaded to the repository directly, the content has to make it to a web server at some point. In this example, the content is being deployed to the frontend web server using Alfresco's file deployment mechanism. In Chapter 8, other content deployment patterns will be explored.

Custom Content-Centric Applications

Content-centric applications are those in which the primary purpose of the application is to process, produce, collaborate on, or manage unstructured or semi-structured content.

The Alfresco web client is an example of a content-centric application, although it is meant for a very general, all-purpose use case. When solutions are very close to basic document management, the web client can be customized as previously discussed. At some point, it makes more sense to build a separate custom application with Alfresco as the backend repository for that application.

Consider the sales process within a company, for example. Sales people create proposals. Those proposals are usually routed internally for review and approval, and then are delivered to the client. If the client accepts the proposal, a contract is drawn up and the product is delivered. The out of the box web client could be used to manage these documents, assign metadata, manage the review process through workflows, and make it all searchable. But the sales team might be even more productive if it used a purpose-built user interface. For this solution, a frontend built with a scripting language such as PHP, a Java framework such as Seam, or even a Rich Internet Application (RIA) technology such as Flex might be a good option. Alfresco would provide the document management services. The frontend would talk to Alfresco via SOAP or RESTful services.

Another example is a "community" site. With so much buzz around Web 2.0, companies are looking for ways to add community features to their online presence such as forums, blogs, and personalized content as well as user-generated content such as comments, ratings, and rich media.

As discussed previously in the WCM section, Alfresco is very good at publishing static files to one or more web servers or application servers. What it lacks, at least in the current release, is a presentation framework. Many clients appreciate this separation because it gives them complete freedom with regard to how they build the frontend. But in the case of a community site, it would be a good thing to be spared of building the frontend from scratch.

One way to implement this kind of solution is to use an open source portal such as Liferay or JBoss Portal for the frontend. Alfresco can manage the content and also the business process used to approve that content for publication in the community site. Portlets can be written that use either SOAP-based or REST-based web services calls, to query for and display content stored in the repository.

Note that the diagram also shows a Single Sign-On (SSO) solution so that users have to log in only once when moving back-and-forth between the portal and Alfresco. This isn't strictly required, but it is worth considering, particularly with freely available open source SSO solutions such as Yale CAS.

The openness of the Alfresco repository, particularly its ability to be easily exposed as a set of services, makes Alfresco an ideal platform for content-centric applications. As the examples have shown, custom content-centric web applications use Alfresco as the backend. As a result, they have complete flexibility in frontend technology choices from portals to lower-level frameworks to no framework at all.

 

Example Used throughout This Book


In this book, we'll assume we are rolling out Alfresco throughout a consulting firm. Professional services firms make great examples because they tend to generate a variety of different documents. The other reason is that document and content management is usually a big challenge, which is the core to the business. But the examples should be applicable to any business that generates a significant amount of documents.

The example firm, SomeCo, wants to leverage document and content management throughout the organization to make it easier to find important information, streamline certain business processes, and secure sensitive documents.

SomeCo's company organization is pretty standard. It consists of Operations, Sales, Human Resources, Marketing, and Finance/Legal. Examples of the different types of content each department is concerned with are shown in the following table:

Department

Example document types

Format and Process Notes

Finance/Legal

Client Proposals for Project Work

Statements of Work

Master Services Agreements

Non-Disclosure Agreements (NDAs)

  • Microsoft Word and Adobe PDF.

  • Several iterations between the firm and the client before a "final" version is completed.

  • Some documents may require internal review and approval.

Marketing

Case studies

Whitepapers

Marketing plans

Marketing slicks/Promotional material

  • Microsoft Word, Microsoft PowerPoint, Adobe PDF, and Adobe Flash.

  • Mostly single-author content.

  • Some content may come from third parties.

  • Some content may need to be published on the web site.

Human Resources

Job postings

Resumes

Interview feedback

Offer letters

Employee Profiles

/Biographies

Project reviews

Annual reviews

  • Microsoft Word, Adobe PDF, and HTML.

  • Single-author content with consumers being spread throughout the company.

  • Some content formats are unpredictable (such as resumes). Some are very standard and could be templatized (such as offer letters).

  • With the exception of job postings, none of this content should go near the Web.

  • Some content needs strict internal permissions.

Sales

Forecast

Presentations

Proformas

  • Microsoft Excel and Microsoft Powerpoint.

  • Some business process and automated document-handling possibilities such as Forecast.

  • Searchability of presentations is important.

Operations

Methodology

Utilization reports

Status reports

  • All Microsoft Office formats.

  • Some opportunity for integration into enterprise systems such as time tracking and project management.

Examples throughout the rest of the book will show how Alfresco can be implemented and customized to meet the needs of the various organizations within SomeCo. During a real implementation, time would be spent gathering requirements, selecting the appropriate open source components to integrate with the solution, finalizing architecture, and structuring the project. There are plenty of other books and resources that discuss how to roll out content management across an enterprise and others that cover project methodologies. So none of that will be covered here.

 

Alfresco Architecture


Many of Alfresco's competitors (particularly in the closed-source space) have sprawling footprints composed of multiple, sometimes competing, technologies that have been acquired and integrated over time. Some have undergone massive infrastructure overhauls over the years, resulting in bizarre vestigial tails of sorts. Luckily, Alfresco doesn't suffer from these traits (at least not yet!). On the contrary, Alfresco's architecture:

  • Is relatively straightforward

  • Is built with state-of-the-art frameworks and open source components

  • Supports several important content management and related standards

Let's look at each of these characteristics, starting with a high-level look at the Alfresco architecture.

High-Level Architecture

The following diagram shows Alfresco's high-level architecture. By the time you finish this book, you'll be intimately familiar with just about every box in the diagram:

The important takeaways at this point are as follows:

  • There are many ways to get content into or out of the repository, whether that's via the protocols on the left side of the diagram or the APIs on the right.

  • Alfresco runs as a web application within a servlet container. In the current release, the web client runs in the same process as the content repository.

  • Customizations and extensions run as part of the Alfresco web application. An extension mechanism separates customizations from the core product to keep the path clear for future upgrades.

  • Metadata resides in a Relational DB while content files and Lucene index reside on the file system. The diagram shows the content residing on the same physical file system as Alfresco, but other types of file storage could be used as well.

  • The WCM Virtualization Server is an instance of Tomcat with Alfresco configuration and JAR files. The Virtualization Server is used to serve up live previews of the web site as the site is being worked on. It can run on the same physical machine as Alfresco or can be split out to a separate node.

Add-Ons

The "Add-ons" are pieces of functionality not found in the core Alfresco distribution. If you are working with the binary distribution, it means you'll have additional files to download and install on top of the base Alfresco installation.

Add-ons are provided by Alfresco, third-party software vendors, and members of the Alfresco community such as partners and customers. Alfresco makes several add-on modules available such as Records Management and Facebook integration. Software vendor Kofax provides add-on software that integrates Alfresco with the Kofax imaging solution. Members of the Alfresco community create and share add-on modules via the Alfresco Forge, a web site set up by Alfresco for that purpose. But the majority of what is available is language packs used to localize the Alfresco web client.

Open Source Components

One of the reasons Alfresco has been able to create a viable offering so quickly is because it didn't start from scratch. The Alfresco engineers assembled the product from many finer-grained open source components. Why does this matter? First, instead of reinventing the wheel, they used proven components. This saved them time, but it also resulted in a more robust, more standard-based product. Second, it eases the transition for people new to the platform. If a developer already knows JavaServer Faces (JSF) or Spring, for example, many of the customization concepts are going to be familiar. (And besides, as a developer, wouldn't you rather invest your time and effort in learning standard development frameworks rather than proprietary "development kits"?)

The following table lists some of the major open source components used to build Alfresco:

Open Source Component

Use in Alfresco

Apache Lucene (http://lucene.apache.org/)

Full-text and metadata search

Hibernate (http://www.hibernate.org/)

Database persistence

Apache MyFaces (http://myfaces.apache.org/)

JSF components in the web client

FreeMarker (http://freemarker.org/)

Web script framework views, custom views in the web client, web client dashlets, email templates

Mozilla Rhino JavaScript Engine (http://www.mozilla.org/rhino/)

Web script framework controllers, Server-side JavaScript, Actions

OpenSymphony Quartz (http://www.opensymphony.com/quartz/)

Scheduling of asynchronous processes

Spring ACEGI (http://www.acegisecurity.org/)

Security (authorization), roles, and permissions

Apache Axis (http://ws.apache.org/axis/)

Web services

OpenOffice.org (http://www.openoffice.org/)

Conversion of office documents into PDF

Apache FOP (http://xmlgraphics.apache.org/fop/)

Transformation of XSL:FO into PDF

Apache POI (http://poi.apache.org/)

Metadata extraction from Microsoft Office files

JBoss jBPM (http://www.jboss.com/products/jbpm)

Advanced workflow

ImageMagick (http://www.imagemagick.org)

Image file manipulation

Chiba (http://chiba.sourceforge.net/)

Web form generation based on XForms

Does this mean you have to be an expert in all open source components used to build Alfresco to successfully implement and customize the product? Not at all! Developers looking to contribute significant product enhancements to Alfresco or those making major, deep customizations to the product may require experience with a particular component, depending on exactly what they are trying to do. Everyone else will be able to customize and extend Alfresco using basic Java and web application development skills.

Major Standards and Protocols Supported

Software vendors love buzz words. As new acronyms climb the hype cycle, vendors scramble to figure out how they can at least appear to support the standard or protocol so that the prospective clients can check that box on the Request for proposal (RFP) (don't even get me started on RFPs). Commercial open source vendors are still software vendors and thus are no less guilty of this practice. But because open source software is developed in the open by a community of developers, its compliance to standards tends to be more genuine. It makes more sense for an open source project to implement a standard than to go off in some new direction because it saves time. It promotes interoperability with other open source projects, and stays true to what open source is all about: freedom and choice.

Here, are the significant standards and protocols Alfresco supports:

Standard/Protocol

Comment

FTP

Content can be contributed to the repository via FTP. Secure FTP is not yet supported.

WebDAV

WebDAV is an HTTP-based protocol commonly supported by content management vendors. It is one way to make the repository look like a file system.

CIFS

CIFS allows the repository to be mounted as a shared drive by other machines. As opposed to WebDAV, systems (and people) can't tell the difference between an Alfresco repository mounted as a shared drive through CIFS and a traditional file server.

JCR API (JSR-170)

JCR is a Java API for working with content repositories such as Alfresco. Alfresco is a JCR-compliant repository. There are two levels of JCR compliance. Alfresco is Level 1 compliant and is near to Level 2 compliant.

Portlet API (JSR-168)

The Web Script Framework lets you define a RESTful API to the repository. Web Scripts can return XML, HTML, JSON, and JSR-168 portlets. In the current release, this requires the portal and Alfresco to be running in the same JVM, but this restriction may go away in the near future.

SOAP

The Alfresco Web Services API uses SOAP-based web services.

OpenSearch (http://www.opensearch.org)

Alfresco repositories can be configured as an OpenSearch data source, which allows Alfresco to participate in federated search queries. OpenSearch queries can be executed from the web client as well. This means if your organization has several repositories that are OpenSearch-compliant (including non-Alfresco repositories), they can be searched from within the web client.

XForms, XML Schema

Web forms are defined using XML Schema. Not all XForms widgets are supported.

XSLT, XSL:FO

Web form data can be transformed using XSL 1.0.

LDAP

Alfresco can authenticate against an LDAP directory or a Microsoft Active Directory server.

 

Customizing Alfresco


Alfresco offers a significant amount of functionality out of the box, but most clients will customize it in some way. At a high level, the types of customizations typically done during an implementation can be divided into basic customizations and advanced customizations.

Basic Customization

Many Alfresco customizations can be done without writing a single line of code. Some may be done even by end users through the web client. Others might require editing a properties file or an XML file. These basic configuration and customization tasks are described in-depth in Alfresco Enterprise Content Management Implementation by Munwar Shariff, Packt Publishing. Let's look at them briefly here so that you can get an idea of what you don't have to code.

Dashlets

When users log in to Alfresco, the first thing that is usually displayed is the My Alfresco Dashboard. The dashboard is a user-configurable layout that contains dashlets. (If you are familiar with portals, think "portal page" and "portlet"). Users choose the layout of the dashboard (number of columns) as well as the specific dashlets they want displayed in each column.

There are a number of dashlets available out of the box, or you can develop your own and add them to the user-selectable list. Examples of out of the box dashlets include workflow-related dashlets such as My Tasks To Do and My Completed Tasks as well as content-related dashlets such as My Documents and My Spaces:

A dashlet is implemented as a JSP page. The JSP page can contain JSF components and make calls to JSF-managed beans. If FreeMarker is more your style, the JSP page can easily delegate its rendition to a FreeMarker template. Obviously, developing custom dashlets is probably not something you'd let your business users do; but it is still considered a basic customization.

Custom Views

Alfresco's web client uses a hierarchical folder metaphor for navigating the repository. Alfresco calls folders "spaces" because in Alfresco, folders can do more than just contain documents. They can also have metadata, rules, and permissions associated with them. I will use "folders" and "spaces" interchangeably throughout the book. The default behavior when a folder is opened is to display the contents of the folder. A common requirement is to display metadata or other information that's not available in the standard content list. One way to do this is to implement a custom view using FreeMarker templates. The custom view can then be applied to the folder by a business user to display it as needed, without changing the underlying folder list functionality.

There are several out of the box FreeMarker templates that can be used as custom views such as "My Documents" and "Recent Documents". Most likely, you'll want to create your own using one of the out of the box templates as an example.

Rules and Rule Actions

A rule is something that says, "When a piece of content is created, updated, or deleted, check these conditions. If these conditions are met, take these actions". Conditions may check whether a piece of content is a particular mime type, or a specific content type. They may also check whether a piece of content has a specific aspect applied, or whether the content's name property matches a particular pattern. Rules can be defined on any folder in the repository. Child folders can inherit rules from their parent.

Rule actions are repeatable operations that enable us to do things similar to those that can be done using JavaScript or Java. Out of the box actions include things such as check-in content, check-out content, move an item to another folder, specialize the type of the content, add an aspect to the content, transform content from one format to another, and so on.

Configuring folders to run rule actions is something non-technical users can do through the web client. In Chapter 4, you'll learn how to write your own custom rule actions using the Alfresco API.

Simple Workflow

Alfresco has two options for implementing workflow: simple workflow or advanced workflow. The good thing about simple workflows is that end users can configure them as needed without any technical skills or developer support.

Here's how it works. A web client user creates a rule to "add a simple workflow" to a document when it is placed in the folder. When a document has a simple workflow, it means it has a "forward step" and a "backward step". A user configuring the simple workflow decides whether to use one or both steps, and assigns appropriate names for steps such as "Approve" and "Reject". When the step is invoked, the content can be copied or moved to another folder. Users create serial processes by setting up multiple folders, each with rules to add the appropriately configured simple workflow to the incoming content. For example, there might be folders called "Draft"," In Review", and "Approved". The state of a document is determined by the folder in which it resides.

Simple workflows have obvious limitations:

  • Workflows are limited to serial processes. Content can only move forward or backward, one step at a time.

  • Content can only be in one process at a given time.

  • Content must change physical locations to reflect a change in state.

  • There is no opportunity for capturing (and acting on) process-related metadata.

  • Tasks can't be assigned to individuals or groups. (Of course, you could limit folders to specific individuals or groups through permissions, which would have a similar effect to a task assignment. But you wouldn't be able to easily pull a list of tasks for a specific user across all simple workflows).

  • Other than creating additional rules and rule actions for the folders used in a simple workflow, there is no way to add logic for decisions or other more complex constructs.

Advanced Customization

The basic configuration and customizations show that there is quite a lot of tweaking and tailoring that can happen before a developer gets involved. This is a good thing. It means a good chunk of the customization requirements can be dealt with quickly. In the case of simple workflows, they can be delegated to the end users altogether! Hopefully, this leaves more time for the more advanced (and more interesting) customizations required for a successful implementation.

Examples of Advanced Customizations

The advanced customizations are the customizations that are likely to require code. They are the focus of the book. To give you an idea of what's possible (and in an effort to provide an appetizer before the main meal is served), let's go over some of the areas of advanced customization.

Extend the Content Model

Alfresco's out of the box content model can be extended to define your own content types, content aspects, content metadata (properties), and relationships (associations). The out of the box model is very generic, and defines only a minimal subset of the metadata that will probably need to be captured with the content.

For example, SomeCo might want to capture different metadata for its Marketing documents than for its HR documents. Or maybe there is a set of metadata that doesn't belong to any one content type in particular, but should rather be grouped together in an aspect and attached to objects as needed. These and other content modeling concepts will be covered in Chapter 3.

Perform Automatic Operations on Content

There are several "hooks" or places where you can insert logic or take action to handle content automatically. These include rule actions, behaviors, content transformers, and metadata extractors. Rule actions have already been discussed. Behaviors are like actions but instead of being something that an end user can invoke on any piece of content, behaviors are tightly bound to their content type. Content transformers, as the name suggests, transform content from one format to another. Metadata extractors inspect content as it is added to the repository, and pull out data to store in the content object's properties. These tools for handling content automatically will all be covered in Chapter 4.

Customize the Web Client

Chapter 5 covers web client customization. Just about everything in the web client can be tweaked. Menu items can be rearranged or new menus and items can be created. If a JSP doesn't work quite the way it needs to, you can override it with your own custom version of the page. Don't like the out of the box date picker? Component renderers for out of the box data types can be overridden or completely new renderers for custom types can be created. If you need to guide users through a multi-step process, you can create custom dialogs and wizards.

In the current release, the web client is built with JSF. In the near future, Alfresco may be moving toward a lighter-weight framework based on web scripts. Regardless of what happens with the new and improved web client, the "classic" web client will be around for a while, so it is a good idea to know how to make it fit your requirements.

Create a RESTful API

Web scripts are one of the more exciting additions to the Alfresco architecture. The Next Generation Internet (NGI) or Web 2.0 (or 3.0 or whatever you want to call it) is built on RESTful services. The reason is that RESTful services are typically much easier to work with using scripting languages and AJAX toolkits than SOAP-based services, because they are invoked through plain old URLs. When a repository has a RESTful interface, it is much easier to incorporate as part of an NGI solution.

The web script framework, based on the Model-View-Control (MVC) pattern, allows you to build your own RESTful API to the repository. It will be covered in detail in Chapter 6, but the high-level summary is that URLs get mapped to a controller implemented as JavaScript or Java. The controller performs whatever logic is needed, then forwards the request to the view. The view is implemented as a FreeMarker template. The template could return anything from markup to XML to JSON. The framework is so powerful and flexible that Alfresco refactors several pieces of the web client to leverage web scripts. RESTful services via web scripts are well on their way to becoming the preferred way to integrate with the Alfresco repository.

Streamline Complex Business Processes with Advanced Workflows

Advanced workflows provide a way to automate complex business processes. Alfresco's advanced workflows are executed by the embedded JBoss jBPM engine, which is a very powerful and popular open source workflow engine.

Rather than basic workflows, which are end-user configurable and limited to serial processes, advanced workflows offer the power of parallel flows, the ability to add logic to the process via JavaScript and Java, and much more.

A handful of advanced workflows are available out of the box. These are most useful as starting points for your own custom advanced workflows. Exactly how it has to be done will be covered in Chapter 7.

Get Your Web sites under Control

Alfresco WCM uses the same web client user interface as everything else in Alfresco, so customization techniques covered in other chapters will apply here. Chapter 8 focuses on specific WCM implementation details such as creating web forms with XML Schema and presentation template development using XSLT and FreeMarker.

Integrate with Other Systems

Most of the coding and configuration discussed so far can be divided into two parts: (1) Customizations made to the core repository and (2) Customizations made to the web client. There is a third bucket to be considered, which is coding and configuration related to integrating Alfresco with other solutions. Maybe Alfresco needs to authenticate against an LDAP directory. Maybe a portal will get its content from Alfresco, or perhaps some other third-party application needs to share content with Alfresco. Chapter 9 discusses how to handle security and integration.

 

Dusting Off Your Toolbox


Looking across both the basic and advanced customizations provides some idea about the extensibility of the platform. A commonly asked question at this point in the architecture discussion is, "Does Alfresco have an API?". Actually, it has several. Let's look at what APIs are available and where they are used. This should also give you some idea as to the tools and skills you'll need to have in your toolbox as you embark on your own projects.

The following table shows the APIs available and where they are used:

Alfresco API

Where Used

Comments

Foundation API

Rule actions, behaviors, Java-based web scripts, web client customizations, jBPM, standalone applications that embed the Alfresco repository.

As the name suggests, this is the core Alfresco API. Most of the work with this API involves writing Java in Plain Old Java Objects (POJOs) that are "wired in" to Alfresco via Spring- or JSF-managed beans.

Web Services API

Web and non-web applications that need remote access to the repository.

Alfresco ships client-side classes for Java and PHP, but any language that can use SOAP-based web services can use this API to do almost everything the Foundation API can do.

JCR API

Web and non-web applications. Can be used remotely via the JCR-RMI bridge.

JCR is a standard (JSR-170) Java API for interacting with content repositories. The JCR API does not have the full functionality of the Foundation API.

FreeMarker API

Custom views, mail templates, web script view logic, WCM presentation transformations.

FreeMarker is an open source templating engine.

AVMRemote API

WCM presentation transformations, web applications.

This API is specific to working with content stored in Alfresco WCM web projects.

Web Script Framework

Web and non-web applications that need to use REST to interact with the repository.

More of a framework than an API, web scripts implement a Model-View-Controller (MVC) pattern that relies on the JavaScript, FreeMarker, and Foundation APIs.

Flex API

Web scripts, Flash components.

Built on the web script framework, the Flex API is really a set of hooks that make it easier to use Adobe's Flex tools to build Rich Internet Applications (RIAs) on top of Alfresco.

Facebook API

Web scripts, social networking applications.

Similar to the Flex API, the Facebook API is a set of web scripts that make it easier for Alfresco-based web scripts to make calls to the Facebook API.

As the list of APIs shows, knowing Java will be the key to just about any successful customization effort. FreeMarker and JavaScript are important, but are easily picked up using Alfresco's code and online resources as references.

What about Adobe Flex?

Alfresco has a vision for a web client with a much richer interface. At one point, the plan was to build the web client entirely with Adobe Flex. Alfresco has since backed off that approach. It is more likely that Flash components will be added where it makes sense.

From a skills standpoint, it is still uncertain how deep Flex skills will need to be to customize Alfresco as it evolves into a richer interface. Hopefully, Alfresco will abstract the configuration and customization of the Flex-based components such that clients can get it without Flex skills. If that doesn't happen, it should be fairly easy for anyone with knowledge of JavaScript and XML to pick up Flex skills.

 

Understanding Alfresco's Editions


Alfresco has two editions of its products (sometimes called "networks"): Labs and Enterprise. It also offers a "Small Business Network" package through the Red Hat Exchange, but this is essentially a user-limited Enterprise version licensed on a "per seat" rather than a "per CPU" basis.

Those familiar with the difference between Fedora Linux and Red Hat Enterprise Linux, or JBoss.org and JBoss.com will immediately understand the distinction between the Alfresco Labs and Alfresco Enterprise editions. Both editions are open source and are available without up-front license fees. However, the Labs edition is completely unsupported while Alfresco provides commercial support for the Enterprise edition. In fact, you can't get access to the Enterprise edition without purchasing a support subscription from Alfresco.

The Labs edition is essentially the developers' playground. It may contain experimental features and community contributions. In source code terms, it can be thought of as the "daily build" or the "unstable build". Therefore, it should not be used in critical applications because it changes quite often. From time to time, functionality will be taken from Labs and placed in the Enterprise code line where it will be integrated with the rest of the product, tested, and officially released as a new supported version.

Initially, the Enterprise edition incorporated every feature available in Labs because the two were parts of the same code line. However, this has changed. The two are now separate code lines. There is no guarantee that a feature in Labs will ever make it to Enterprise. But if there is a good reaction to the functionality among Labs users, if the functionality is being demanded by Enterprise customers, and if the code plays well with the Enterprise code base, it is likely to be made part of the Enterprise release at some point. This means you should be very careful if you choose to put solutions based on Labs in front of your users. If they fall in love with a feature unique to the Labs edition and then demand commercial support from Alfresco, you might find yourself in a very tough position.

Significant Feature Differences

At the time of this writing, the latest supported release from Alfresco is Alfresco Enterprise 2.2. The latest community release is Alfresco Labs 3.0 Preview. Of course, there are many feature differences between the two. The most significant difference is that the Labs edition includes the Flex and Facebook APIs as well as the new Surf web framework, and the new 3.0 web client called Share.

What's Used in This Book

The vast majority of examples used in this book will work on both the Enterprise and Labs editions (2.2 and 3.0, respectively). Where a specific release is required, it will be noted wherever possible.

 

Summary


Hopefully, this chapter has given you several ideas about how Alfresco can be used to implement Document Management, Web Content Management, and custom content-centric applications by walking through examples of each. The details may still be fuzzy, but the goal was to introduce the major components and capabilities of the Alfresco platform.

The key points covered in this chapter were:

  • Alfresco can be used to solve a variety of content-related business problems from document management to web content management to workflow and collaboration.

  • Throughout the rest of the book you'll customize and extend Alfresco to meet the needs of SomeCo, a fictitious consulting firm.

  • Alfresco is assembled with open source components, runs as a web application within an application server, and exposes the repository through many different protocols and APIs.

  • Alfresco can be customized. Some types of customization are very basic (more configuration than customization) and can be performed by end users through the web client. Others are more advanced and require coding. The advanced customizations are the subject of this book.

  • The most common tools used to extend the platform are Java, JavaScript, FreeMarker, and XML.

  • The two flavors or editions of Alfresco—Labs and Enterprise—are somewhat analogous to Fedora and Red Hat Enterprise Linux. Labs is "daily build", primarily for developers and experimentation while Enterprise is for production systems.

About the Author

  • Jeff Potts

    Jeff Potts is the founder of Metaversant Group, Inc., a consulting firm focused on content management, search, and workflow. Jeff brings over 20 years of Enterprise Content Management implementation experience to organizations of all sizes including the Fortune 500. Throughout his consulting career he has worked on a number of projects for clients across the media and entertainment, airline, consumer packaged goods, and retail sectors.

    Jeff began working with and blogging about Alfresco in November of 2005. In 2006 and 2007, he published a series of Alfresco tutorials and published them on his blog, ecmarchitect.com. That work, together with other Community activity in Alfresco's forum, Wiki site, and JIRA earned him Alfresco's 2007 Community Contributor of the Year Award.

    In the past, Mr. Potts has worked for Alfresco Software, Inc. as Chief Community Officer, Optaros as Senior Practice Director, and Hitachi Consulting as Vice President where he ran the ECM practice.

    Browse publications by this author

Latest Reviews

(1 reviews total)
No issues. Easy transaction.
Book Title
Unlock this book and the full library for FREE
Start free trial