There are some common features in providing website content, but also many differences. Applications easily become complex as they tackle real world problems, and there has been much real innovation in web systems. So the areas to look at in this article are:
- Major areas for content development
- A review of minor yet important areas
- How a simple text manager is built
- An outline of a complex content delivery extension
Discussion and considerations
Now, we will work through the major areas of website content, devoting a section to each one. A round up of some less important aspects of content completes the discussion, leaving us ready to move on to details of implementation.
Articles, blogs, magazines, and FAQ
The most basic requirement is for text and pictures, and the simplest scheme needs little more than the standard database and a WYSIWYG editor. An extension that works at this level is illustrated later in the article. It is pretty much essential to have an ability to create items of this kind in an unpublished state so that they can be revised until ready for use. The state is then changed to published. Almost immediately, a further requirement arises to specify a range of publication dates, so that material aimed at a specific event can be automatically published at the appropriate time. Likewise, it is desirable to have an automatic mechanism for removing information that is no longer current, for example because it refers to a coming event in terms that will be irrelevant once the event has passed. A website that carries plainly obsolete articles is unlikely to be popular!
There are many ways to organize textual material. One is to place it into some kind of tree structure, rather akin to the classification schemes used in libraries. Ideally, such a scheme has no particular constraints on the depth of the tree structure. A concern with this approach is that it can quickly lead to a conflict between two alternative uses—classification according to subject and classification according to reader permissions. An option that can be used in conjunction with a tree structure is to use some form of tagging. This introduces much greater flexibility in some respects, as it is easy to apply multiple tags to a single item of content, which can therefore be classified in a wide variety of ways, and can appear under multiple headings.
A blog is an example of a system that might work best with a combination of a classification tree and a tagging scheme. Where there are several people creating blogs, the different authors fit well with a tree structure, since there is no question of an item belonging to more than one author. On the other hand, items are often tagged according to their subject matter, and several tags may be applicable to an individual article. If authors create more than one blog and there are questions about which visitors are able to see which blog, then careful thought needs to be given as to whether the split of blogs is best handled by the classification tree or by tagging. Using a tree achieves rigid separation, and is easily amenable to imposing access controls. But if the same item appears in more than one blog, then tagging works better as the item is ideally stored only once but has multiple tags. Blogs also frequently provide for comments, discussed in the next section.
A magazine is typically a collection of articles. For a simple case, it might be adequate for the articles of the magazine to be equated to website pages, but a more sophisticated magazine would want to avoid restrictions of that kind. The basic unit of content would still need to be an individual article, but website pages then require some kind of template to build a page from multiple items.
One popular application for quite simple content is the compilation of frequently asked questions (FAQ's). Advanced implementations might be described more grandly as knowledge bases. Again, both a classification tree and tagging can be relevant, but a useful FAQ (and especially one that wants to be a knowledge base) also needs effective search facilities so that information can be easily found.
In all of these cases, added complexity arises if facilities like versioning are needed. Another similar issue is the need for workflow and differing roles, such as authors and editors. Mention of roles suggests a RBAC mechanism. It seems unlikely that one single model will ever meet every requirement in areas such as versioning and workflow. Version control can become extremely complex, and usually requires the allocation of roles that involve access rights and functional capabilities. Workflow is much the same. In both cases, though, simple and rigid schemes are liable to create problems. For example, the same person is quite likely to be an author in some situations, and an editor or publisher in others. A flexible and an efficient RBAC system is a pre-requisite for handling these problems, but as discussed earlier, the technical provision of RBAC is only a start. Applying it to particular systems and creating an appropriate user interface is a considerable challenge.
Comments and reviews
One of the successful innovations brought about by widespread use of the Web has been feedback through comments and reviews. Amazon is only one of many sites that now include reviews by customers of the products on sale. It could be said that this is a form of social networking, as the more sophisticated sites maintain profiles of reviewers and encourage them to achieve their own identity. Regular readers in particular areas of interest can get to know reviewers and form an opinion on the reliability of their views.
There are two main problems with implementing comments and reviews. One is the question of how to generalize the facility, so as to avoid implementing it repeatedly in different applications. The other is how to deal with the ever present threat of spam.
From the point of view of a developer, handling comments raises much the same issues regardless of what may be the subject of the comments. So blogs, selections of products, image galleries, and so on are all capable of having comments added to their items using similar mechanisms. This suggests a structure something like the scheme where the coarse grained structure is the component, but its display is achieved through the use of a template and a number of modules. Comments can thus be generated by a module that knows relatively little about the application, only enough to keep its comments separate from those for other applications and to relate a set of comments to a particular item, whether it is a blog item, product, gallery image, or whatever. That deals with the display of existing comments, which still leaves a requirement for a general interface that allows new comments to be added. The comment facility can easily enough handle the acceptance of a new comment, although it may need help if the page that accepts comments is to also show the object to which the comment applies. The comment facility also needs to know where to hand control once a new comment has been completed. Some moderately tricky detailed design is involved in providing an implementation of the full scheme.
The other big problem with any facility that permits visitors to a site to enter information for display is that it attracts spammers. Usually, they arrive not in person but in the form of automated bots that can become very sophisticated. There are bots that know how to obtain an account, and log in to a range of systems. There are even bots that can handle CAPTCHAs (those messed up images out of which you are supposed to decipher letters or numbers). Some of the bots can handle CAPTCHAs better than some humans, which makes for accessibility problems. Fortunately, much link spamming is for the purpose of promoting websites, and so the spammer has to give away some information in the form of the link to the site being promoted. A reasonably effective defense against this kind of spamming is a collaborative scheme for blacklisting sites. Even that is not totally effective, as spammers find ways to create new sites quickly and cheaply, so that the threat is constantly changing. As with most forms of attack, there is unlikely to be any conclusion to this battle.
Forums are a very popular Web feature, providing a structured means for public or private discussion. Developing a forum is a major undertaking, and most people will prefer to choose from existing software products. Forum software usually provides for visitors to contribute messages, either starting a new topic or replying to an existing one. There is often a hierarchical structure to the messages so that a number of different areas of interest can be covered in a convenient way. Advanced systems include sophisticated user management, including support for a variety of different groups, which provides a means to decide who has access to which topics. Unwanted messages are a constant threat, and most active forums need moderators to weed them out.
Development of a new forum will clearly need a number of the framework features discussed earlier. Robust user control is essential, and if different users are granted different access rights, a good system of RBAC is a requirement. A forum is highly amenable to the use of cache, since pages are likely to be constructed out of a number of database records, but the records are updated relatively infrequently. To be responsive, the cache needs to have a degree of intelligence so that pages with new contributions are refreshed quickly. Mail services are likely to be employed so that subscribers can receive notification of new contributions to topics in which they have registered an interest.
Another approach is to seek a degree of integration between off the shelf forum software and the CMS. The most popular area for integration is user login. Obviously it is necessary to obtain some information about the way in which the forum software is implemented. Provided that can be found, then it is a relatively simple matter to integrate with a CMS that has been built with plentiful plug in triggers around the area of user authentication. From the point of view of visual integration, the amount of screen space needed by a forum is such that it is often difficult to build it within the framework of a typical CMS. Often a better approach is to build a custom theme for the forum that includes links back to the main site, so as to avoid completely losing continuity of navigation.
Galleries, repositories, and streaming
Although they have come from different requirements, galleries, and file repositories have a lot in common. Both start out simple and rapidly become complex. The general idea of a gallery is to build a collection of images, typically organized into categories and accessible via small versions of the images (thumbnails). File repositories have long been popular since the days of bulletin boards, where collections of files (often programs) were made available for download. Ideally the organization into categories (or folders or containers) is flexible with no particular limit on the depth to which subcategories can go.
Some basic requirements relate to security. It is obviously essential to avoid hosting files that could contain malicious PHP code. This includes avoiding uploads of image files that contain PHP code embedded within actual image data. Simple checks can be fooled by this technique, but a block on the .php extension prevents the code being interpreted. Another potentially major security issue is bandwidth theft. If files or images are too easily accessed, then other sites may choose to use them without acknowledgment, transferring the bandwidth costs to the site hosting the material.
As applications broaden, access control becomes an issue. Files are to be made available only to a restricted group, and uploads may be restricted more tightly again. There may be administrator oversight, with uploads needing approval. Once again, we are seeing a demand for an effective access control system, preferably role-based. In fact demands on systems of this kind can easily become very sophisticated, such as allowing users to have personal upload areas over which they have complete control to determine who is able to gain access. An RBAC system that is technically capable of handling this can be built relatively easily, although creating a good user interface is a challenge.
Whether the system is a gallery or file repository, the use of thumbnail images is increasingly prevalent. File uploads may, therefore, be accompanied by one or more image files that are used to enhance the display of the files available.
Information about the system is likely to be needed, such as which are the most recent additions to the collection, which items are most popular, who has accessed what, and who has uploaded what. Information of this kind can also contribute to security by providing an audit trail of what has been happening to the system.
Streaming of files is a demand now often placed on a file repository, as the files can be audio or video files made available for immediate access. Streaming is simply a mode of file processing whereby the information is delivered to the user at a speed adequate for consumption in real time. Clearly video tends to place greater demands on the system than audio. The problems are both hardware and software related, although with steadily improving technology it is increasingly feasible to overcome both.
E-commerce and payments
Everyone is aware of the huge growth of commercial transactions on the Web. The kind of transaction involved can vary widely across simple fixed price retail sales, auctions of various kinds, and reverse auctions for procurement. For retail transactions immediate settlement is usually required, whereas larger scale business to business transactions are usually handled through relatively traditional invoicing methods. Even those are tending to be altered towards paperless billing and payment schemes that cut transaction costs to a minimum.
Systems for e-commerce vary enormously in their sophistication from simple requests for payment using a PayPal button to highly sophisticated Web operations such as Amazon and eBay. Open source PHP software exists to cover a significant part of this spectrum, some of it in the form of extensions to CMS frameworks.
PayPal has achieved a very high profile, especially with smaller operators, by offering easy access for merchants combined with technology that is relatively simple to implement. This includes the ability to complete a transaction with online confirmation in a way that is suitable for the sale of electronically deliverable goods such as software.
Clearly, robust authentication of users is essential for e-commerce. For all but the simplest transactions, some kind of shopping cart is highly desirable. These requirements imply a need for good session handling, preferably taking effect as soon as a visitor arrives at a site. Nearly every shopping site will allow a visitor to accumulate items in a shopping cart prior to any kind of login.
There is a plethora of payment systems, some of them suitable mainly for large volume uses, but others that can be applied on a small scale. A particular CMS framework might adopt some standard payment mechanisms that are then integral to the CMS and can be used whenever needed. Security is obviously paramount, as loss of data is both financially damaging and extremely bad for the site's reputation.
E-commerce sites also often use a number of the features described in other sections here. A popular addition is the ability for customers to review the items they have purchased. This kind of facility may lead to further requirements to distinguish categories of users so as to give incentives to people who regularly write reviews.
One thread that has appeared repeatedly is my concern over the high cost of development for websites. Some developers have tackled this by building systems that support the construction of forms using a simple specification language. Form generators have long been popular as one tool for general software development, so it is no surprise that the idea should be applied to the web.
There are practical issues such as the effective validation of user input and the effective storage of the captured information. Structured information may not fit easily into a simple database structure, making it hard for generalized code to cope. Demands for flexibility in the way captured information is presented are likely to make the system grow more complex.
In fact, the tendency towards complexity is the biggest issue with highly parameter driven systems like a form generator. Pretty much any problem can be solved, if it is solvable at all, using a third generation programming language such as C. But the price paid for this generality is that all but the simplest programs require specialist skills for their construction. A parameterized system aims to provide flexibility without demanding specialist skills. It is a difficult balance to achieve. If the system is too simple, people will be dissatisfied with its capabilities. If it is too complex, the development problems may become overwhelming and the user is likely to have difficulty building correct parameters. Only a few systems achieve a good balance, and these are often dedicated to solving some particular problems.
Nonetheless, form generators are one worthwhile route to leveraging software development effort, so as to produce websites at less cost and in shorter time.
The peculiarities of calendars used in everyday life, such as irregular months that are not a whole number of weeks, mean that even the simplest calendar implementation has to cope with some moderately complex logic. All the same, calendars of one kind or another are popular for websites.
When someone has to choose a date, whether in the administration of a site or as a part of a service to visitors, it may well be preferable to offer a visually helpful calendar rather than expecting a date to be keyed in. Entering dates can easily result in confusion and complex validation over issues such as the format being used. A visual calendar is easy to grasp, although it can be tedious to use for distant dates and may present accessibility issues. Probably the best solution for this situation is a combination of input field and calendar.
Complexity rises with calendars that show events of one kind or another, but this can be a valuable feature for many sites. If the events are set only by administrators, then validation is quite simple, but if visitors are allowed to post events, then there is a need for some kind of validation and perhaps an approval system.
The most sophisticated calendar systems relate to real time bookings where a visitor to a website is able to book a block of time for some facility. The validation issues become even more substantial, and bookings may need to be integrated with a payment system or perhaps a system that manages subscribers who have booking rights.
An interesting area with great potential is integration of web software. Candidates abound, but a good illustration is forum software. Forums have been mentioned already as a content category and there are developments that use a particular CMS as a platform. But the area of forum software has been sufficiently important that many stand alone systems have been written. Since their developers have so far been disinclined to commit themselves to any particular CMS, the alternative is to attempt integration between the CMS framework and an independent forum.
High on the list of integration aims is the alignment of user authentication. As mentioned above, authentication is a vital part of systems that permit website visitors to enter information for general publication. Yet nothing is more calculated to frustrate visitors than a website that operates multiple, separate authentication systems. Whether or not login is confined to a single point within the combined system, the objective is that a single login provides access to the whole range of services offered by the site.
Other targets for integration involve access to the information of the forum. This permits features in the main website such as listing out the most recent posts in the forum, or highlighting the most prolific authors of posts. Such services can help to integrate the whole site, and to maintain the interest of visitors by emphasizing the variety and quality of the information available.
Although forums are the most obvious candidates for integration, many other services can be considered. A few examples are bug trackers, paperless billing systems, and blogs. In the case of blogs, there has been a definite tendency for single function systems to become more like a CMS framework, with Wordpress being an obvious example.
A logical approach to integration involves building a general framework that can be extended in some way to handle specific cases. This might be through some form of plug in or by building classes that provide a standardized interface. Whatever the approach, integration will be greatly helped if the CMS framework itself has ample hooks for plug-ins or extensions.
One category of content stems from the widespread availability of RSS feeds. There are various levels of RSS, but they share many features, and provide a standardized XML-based way for a website to offer some of its material in a form that can be easily used in other places. For example, most news sites will provide their latest stories as an RSS feed, providing headlines and a brief introduction. The summary information provided in an RSS feed is accompanied by a link to the providing website, where fuller details are usually made available.
Feeds can be consumed to provide a major feature within a website, or can be used as supplementary information, fitted into sidebars or somewhere. For PHP-based systems, many people use the Magpie RSS reader to provide the basic functionality. Magpie is an open source project that has yielded a good reader with a reasonably simple interface, although Magpie has not seen any recent development. The Magpie project can be found at http://magpierss.sourceforge.net/.
The extent of website content is only limited by people's imagination, so it is difficult to summarize everything that can be achieved. Categories described above are the most popular, but many others have substantial representation.
Chat is a popular internet feature and many websites have provided the ability for visitors to exchange messages with one another. The obvious problems are to do with misuse and legal liability for the material that is transmitted.
Mailing lists have existed since the earliest days of the internet, and modern forms of the list are usually managed through a Web interface, both for subscribers and administrators. A complete implementation in PHP is liable to run into difficulties over the transmission of large numbers of emails. The hosting environment for PHP is designed for handling a series of requests, each of which is turned round relatively quickly. Processing and elapsed time are commonly restricted per request. It is often difficult to send emails very rapidly, and introducing pauses can cause the PHP program to run out of elapsed time. A possible solution is to use PHP to provide a front end to a standard mailing list system.
Newsletters are a variant on the mailing list theme, but introduce more advanced facilities for developing the content, which is sent to the subscribers.
Polls are often a feature of websites, although all but the busiest sites tend to have difficulty getting enough participation to make the results meaningful. As an adjunct to comments and reviews, discussed above, poll type features are often used to introduce a quantitative element for feedback.
Given the attention they get from many people, weather forecasts are a popular feature of websites, usually derived from feeds provided by a small number of forecasting or broadcasting organizations in each part of the world.
Menus are an important kind of content, and are significant enough to a CMS framework.
Information about books is popular, especially if it involves personal selections accompanied by reviews. Often, there is a commercial angle to book choices, with links to a site such as Amazon for purchase of each book, and a small fee being paid for click-through(s) that lead to a purchase.
There is also a host of possibilities for traditional applications such as project management. Almost any application can be given a Web interface and be incorporated into a CMS framework.