Home Web Development Modernizing Legacy Applications in PHP

Modernizing Legacy Applications in PHP

By Paul Jones
books-svg-icon Book
eBook $35.99 $24.99
Print $28.99
Subscription $15.99 $10 p/m for three months
$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!
eBook $35.99 $24.99
Print $28.99
Subscription $15.99 $10 p/m for three months
What do you get with a Packt Subscription?
This book & 7000+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook + Subscription?
Download this book in EPUB and PDF formats, plus a monthly download credit
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with a Packt Subscription?
This book & 6500+ ebooks & video courses on 1000+ technologies
60+ curated reading lists for various learning paths
50+ new titles added every month on new and emerging tech
Early Access to eBooks as they are being written
Personalised content suggestions
Customised display settings for better reading experience
50+ new titles added every month on new and emerging tech
Playlists, Notes and Bookmarks to easily manage your learning
Mobile App with offline access
What do you get with eBook?
Download this book in EPUB and PDF formats
Access this title in our online reader
DRM FREE - Read whenever, wherever and however you want
Online reader with customised display settings for better reading experience
What do you get with video?
Download this video in MP4 format
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with video?
Stream this video
Access this title in our online reader
DRM FREE - Watch whenever, wherever and however you want
Online reader with customised display settings for better learning experience
What do you get with Audiobook?
Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF
What do you get with Exam Trainer?
Flashcards, Mock exams, Exam Tips, Practice Questions
Access these resources with our interactive certification platform
Mobile compatible-Practice whenever, wherever, however you want
  1. Free Chapter
    Legacy Applications
About this book
Have you noticed that your legacy PHP application is composed of page scripts placed directly in the document root of the web server? Or, do your page scripts, along with any other classes and functions, combine the concerns of model, view, and controller into the same scope? Is the majority of the logical flow incorporated as include files and global functions rather than class methods? Working with such a legacy application feels like dragging your feet through mud, doesn’t it?This book will show you how to modernize your application in terms of practice and technique, rather than in terms of using tools such as frameworks and libraries, by extracting and replacing its legacy artifacts. We will use a step-by-step approach, moving slowly and methodically, to improve your application from the ground up. We’ll show you how dependency injection can replace both the new and global dependencies. We’ll also show you how to change the presentation logic to view files and the action logic to a controller. Moreover, we’ll keep your application running the whole time. Each completed step in the process will keep your codebase fully operational with higher quality. When we are done, you will be able to breeze through your code like the wind. Your code will be autoloaded, dependency-injected, unit-tested, layer-separated, and front-controlled. Most of the very limited code we will add to your application is specific to this book. We will be improving ourselves as programmers, as well as improving the quality of our legacy application.
Publication date:
August 2016
Publisher
Packt
Pages
286
ISBN
9781787124707

 

Chapter 1. Legacy Applications

In its simplest definition, a legacy application is any application that you, as a developer, inherit from someone else. It was written before you arrived, and you had little or no decision-making authority in how it was built.

However, there is a lot more weight to the word legacy among developers. It carries with it connotations of poorly organized, difficult to maintain and improve, hard to understand, untested or untestable, and a series of similar negatives. The application works as a product in that it provides revenue, but as a program, it is brittle and sensitive to change.

Because this is a book specifically about PHP-based legacy applications, I am going to offer some PHP-specific characteristics that I have seen in the field. For our purposes, a legacy application in PHP is one that matches two or more of the following descriptions:

  • It uses page scripts placed directly in the document root of the web server.

  • It has special index files in some directories to prevent access to those directories.

  • It has special logic at the top of some files to die() or exit() if a certain value is not set.

  • Its architecture is include-oriented instead of class-oriented or object-oriented.

  • It has relatively few classes.

  • Any class structure that exists is disorganized, disjointed, and otherwise inconsistent.

  • It relies more heavily on functions than on class methods.

  • Its page scripts, classes, and functions combine the concerns of model, view, and controller into the same scope.

  • It shows evidence of one or more incomplete attempts at a rewrite, sometimes as a failed framework integration.

  • It has no automated test suite for the developers to run.

These characteristics are probably familiar to anyone who has had to deal with a very old PHP application. They describe what I call a typical PHP application.

 

The typical PHP application


Most PHP developers are not formally trained as programmers, or are almost entirely self-taught. They often come to the language from other, usually non-technical, professions. Somehow or another, they are tasked with the duty of creating webpages because they are seen as the most technically-savvy person in their organization. Since PHP is such a forgiving language and grants a lot of power without a lot of discipline, it is very easy to produce working web pages and even applications without a lot of training.

These and other factors strongly influence the underlying foundation of the typical PHP application. They are usually not written in a popular full-stack framework or even a micro-framework. Instead, they are often a series of page scripts, placed directly in the web server document root, to which clients can browse directly. Any functionality that needs to be reused has been collected into a series of include files. There are include files for common configurations and settings, headers and footers, common forms and content, function definitions, navigation, and so on.

This reliance on include files in the typical PHP application is what makes me call them include-oriented architectures. The legacy application uses include calls everywhere to couple the pieces of the program into a single whole. This is in contrast to a class-oriented architecture, where even if the application does not adhere to good object-oriented programming principles, at least the behaviors are bundled into classes.

File Structure

The typical include-oriented PHP application generally looks something like this:

/path/to/docroot/
bin/                         # command-line tools
cache/                    # cache files
common/                # commonly-used include files
classes/                 # custom classes
Image.php            #
Template.php       #
functions/             # custom functions
db.php                 #
log.php                #
cache.php           #
setup.php            # configuration and setup
css/                     # stylesheets
img/                    # images
index.php           # home page script
js/                       # JavaScript
lib/                     # third-party libraries
log/                    # log files
page1.php        # other page scripts
page2.php        #
page3.php        #
sql/                   # schema migrations
sub/                  # sub-page scripts
index.php         #
subpage1.php #
subpage2.php #
theme/             # site theme files
header.php      # a header template
footer.php        # a footer template
nav.php           # a navigation template ~~

The structure shown is a simplified example. There are many possible variations. In some legacy applications, I have seen literally hundreds of main-level page scripts and dozens of subdirectories with their own unique hierarchies for additional pages. The key is that the legacy application is usually in the document root, has page scripts that users browse to directly, and uses include files to manage most program behavior instead of classes and objects.

Page Scripts

Legacy applications will use individual page scripts as the access point for public behavior. Each page script is responsible for setting up the global environment, performing the requested logic, and then delivering output to the client.

Appendix A, Typical Legacy Page Script contains a sanitized, anonymized version of a typical legacy page script from a real application. I have taken the liberty of making the indentation consistent (originally, the indents were somewhat random) and wrapping it at 60 characters so it fits better on e-reader screens. Go take a look at it now, but be careful. I won't be held liable if you go blind or experience post-traumatic stress as a result! As we examine it, we find all manner of issues that make maintenance and improvement difficult:

  • The include statements to execute setup and presentation logic

  • inline function definitions

  • global variables

  • model, view, and controller logic all combined in a single script

  • trusting user input

  • possible SQL injection vulnerabilities

  • possible cross-site scripting vulnerabilities

  • unquoted array keys generating notices

  • The if blocks not wrapped in braces (adding a line in the block later will not actually be part of the block)

  • copy-and-paste repetition

The Appendix A, Typical Legacy Page Script example is relatively tame as far as legacy page scripts go. I have seen other scripts where JavaScript and CSS code have been mixed in, along with remote-file inclusions and all sorts of security flaws. It is also only (!) about 400 lines long. I have seen page scripts that are thousands of lines long which generate several different page variations, all wrapped into a single switch statement with a dozen or more case conditions.

Rewrite or Refactor?

Many developers, when presented with a typical PHP application, are able to live with it for only so long before they want to scrap it and rewrite it from scratch. Nuke it from orbit; it's the only way to be sure! is the rallying cry of these enthusiastic and energetic programmers. Other developers, their enthusiasm drained by their death march experience, feel cautious and wary at such a suggestion. They are fully aware that the codebase is bad, but the devil (or in our case, code) they know is better than the devil they don't.

The Pros and Cons of Rewriting

A complete rewrite is a very tempting idea. Developers championing a rewrite feel like they will be able to do all the right things the first time through. They will be able to write unit tests, enforce best practices, separate concerns according to modern pattern definitions, and use the latest framework or even write their own framework (since they know best what their own needs are). Because the existing application can serve as a reference implementation, they feel confident that there will be little or no trial-and-error work in rewriting the application. The needed behaviors already exist; all the developers need to do is copy them to the new system. The behaviors that are difficult or impossible to implement in the existing system can be added on from the start as part of the rewrite.

As tempting as a rewrite sounds, it is fraught with many dangers. Joel Spolsky had this to say regarding the old Netscape Navigator web browser rewrite in 2000:

 

Netscape made the single worst strategic mistake that any software company can make by deciding to rewrite their code from scratch. Lou Montulli, one of the 5 programming superstars who did the original version of Navigator, emailed me to say, I agree completely, it's one of the major reasons I resigned from Netscape. This one decision cost Netscape 3 years. That's three years in which the company couldn't add new features, couldn't respond to the competitive threads from Internet Explorer, and had to sit on their hands while Microsoft completely ate their lunch.

 
 --Joel Spolsky, Netscape Goes Bonkers

Netscape went out of business as a result.

Josh Kerr relates a similar story regarding TextMate:

 

Macromates, an indie company who had a very successful text editor called Textmate, decided to rewrite the code base for Textmate 2. It took them 6 years to get a beta release out the door which is an eternity in today's time and they lost a lot of market share. When they did release a beta, it was too late and 6 months later they folded the project and pushed it on to Github as an open source project.

 
 --Josh Kerr, TextMate 2 And Why You Shouldn't Rewrite Your Code

Fred Brooks calls the urge to do a complete rewrite the second-system effect. He wrote about this in 1975:

 

The second is the most dangerous system a man ever designs. ... The general tendency is to over-design the second system, using all the ideas and frills that were cautiously sidetracked on the first one. ... The second-system effect has ... a tendency to refine techniques whose very existence has been made obsolete by changes in basic system assumptions. ... How does the project manager avoid the second-system effect? By insisting on a senior architect who has at least two systems under his belt.

 
 --Fred Brooks, The Mythical Man-Month, pp. 53-58.

Developers were the same forty years ago as they are today. I expect them to be the same over the next forty years as well; human beings remain human beings. Overconfidence, insufficient pessimism, ignorance of history, and the desire to be one's own customer all lead developers easily into rationalizations that this time will be different when they attempt a rewrite.

Why Don't Rewrites Work?

There are lots of reasons why a rewrite rarely works, but I will concentrate on only one general reason here: the intersection of resources, knowledge, communication, and productivity. (Be sure to read The Mythical Man-Month (pp. 13-26) for a great description of the problems associated with thinking of resources and scheduling as interchangeable elements.)

As with all things, we have only limited resources to bring to bear against the rewrite project. There are only a certain number of developers in the organization. These are the developers who will have to do both maintenance on the existing program and write the completely new version of the program. Any developers working on the one project will not be able to work on the other.

The Context-switching problem

One idea is to have the existing developers spend part of their time on the old application and part of their time on the new one. However, moving a developer between the two projects will not be an even split of productivity. Because of the cognitive load of context-switching, the developer will be less than half as productive on each.

The Knowledge problem

To avoid the productivity losses from switching developers between maintenance and the rewrite, the organization may try to hire more developers. Some can then be dedicated to the old project and others to the new project. Unfortunately, this approach reveals what F. A. Hayek calls the knowledge problem. Originally applied to the realm of economics, the knowledge problem applies equally as well to programming.

If we put the new developers on the rewrite project, they won't know enough about the existing system, the existing problems, the business goals, and perhaps not even the best practices for doing the rewrite to be effective. They will have to be trained on these things, most likely by the existing developers. This means the existing developers, who have been relegated to maintaining the existing program, will have to spend a lot of time communicating knowledge to the new hires. The amount of time involved is non-trivial, and the communication of this knowledge will have to continue until the new developers are as well-versed as the existing developers. This means that the linear increase in resources results in a less-than-linear increase in productivity: a 100% increase in the number of programmers will result in a less than 50% increase in output, sometimes much less (cf. The Miserable Mathematics of the Man-Monthhttp://paul-m-jones.com/archives/1591).

Alternatively, we could put the existing developers on the rewrite project, and the new hires on maintenance of the existing program. This too reveals a knowledge problem because the new developers are completely unfamiliar with the system. Where will they get the knowledge they need to do their work? From the existing developers, of course, who will still need to spend valuable time communicating their knowledge to the new hires. Once again, we see that the linear increase in developers leads to a less-than-linear increase in productivity.

The Schedule Problem

To deal with the knowledge problem and the related communication costs, some may feel the best way to handle the project would be to dedicate all the existing developers on the rewrite, and delay maintenance and upgrades on the existing system until the rewrite is done. This is a great temptation because the developers will be all too eager to salve their own pains and become their own customers - becoming excited about what features they want to have and what fixes they want to make. These desires will lead them to overestimate their own ability to perform a full rewrite and underestimate the amount of time needed to complete it. The managers, for their part, will accept the optimism of the developers, perhaps adding some buffer in the schedule for good measure.

The overconfidence and optimism of the developers will morph into frustration and pain when they realize the task is actually much greater and more overwhelming than they first thought. The rewrite will go on much longer than anticipated, not by a little, but by an order of magnitude or more. For the duration of the rewrite, the existing program will languish - buggy and missing features - disappointing existing customers and failing to attract new ones. The rewrite project will, at the end, become a panicked death march to get it done at all costs, and the result will be a codebase that is just as bad as the first one, only in different ways. It will be merely a copy of the first system, because schedule pressures will have dictated that new features be delayed until after an initial release is achieved.

Iterative Refactoring

Given the risks associated with a complete rewrite, I recommend refactoring instead. Refactoring means that the quality of the program is improved in small steps, without changing the functionality of the program. A single, relatively small change is introduced across the entire system. The system is then tested to make sure it still works properly, and finally, the system is put into production. A second small change builds on the previous one, and so on. Over a period of time, the system becomes markedly easier to maintain and improve.

A refactoring approach is decidedly less appealing than a complete rewrite. It defies the core sensibilities of most developers. The developers have to continue working with the system as it is, warts and all, for long periods of time. They do not get to switch over to the latest, hottest framework. They do not get to become their own customers and indulge their desires to do things right the first time. Being a longer-term strategy, the refactoring approach does not appeal to a culture that values rapid development of new applications over patching existing ones. Developers usually prefer to start their own new projects, not maintain older projects developed by others.

However, as a risk-reducing strategy, using an iterative refactoring approach is undeniably superior to a rewrite. The individual refactorings themselves are small compared to any similar portion of a rewrite project. They can be applied in much shorter periods of time than a comparable feature would be in a rewrite, and they leave the existing codebase in a working state at the end of each iteration. At no point does the existing application stop operating or progressing. The iterative refactorings can be integrated into a larger process with scheduling that allows for cycles of bug fixes, feature additions, and refactorings to improve the next cycle.

Finally, the goal of any single refactoring step is not perfection. The goal in each step is merely improvement. We are not trying to realize an impossible goal over a long period of time. We are taking small steps toward easily-visualized goals that can be accomplished in short timeframes. Each small refactoring win will both improve morale and drive enthusiasm for the next refactoring step. Over time, these many small wins accumulate into a single big win: a fully-modernized codebase that has never stopped generating revenue for the business.

 

Legacy Frameworks


Until now, we have been discussing legacy applications as page-based, include-oriented systems. However, there is also a large base of legacy code out there using public frameworks.

Framework-based Legacy Applications

Each different public framework in PHP land is its own unique hell. Applications written in CakePHP (http://cakephp.org/) suffer from different legacy issues than those written in CodeIgniter, Solar, Symfony 1, Zend Framework 1, and so on. Each of these different frameworks, and their varying work-alikes, encourage different kinds of tight-coupling in applications. Thus, the specific steps needed to refactor applications built using one of these frameworks are very different from the steps needed for a different framework.

As such, various parts of this book may be useful as a guide to refactoring different parts of a legacy application based on a public framework, but as a whole, the book is not targeted at refactoring applications based on these public frameworks.

In-house, private, or otherwise non-public frameworks under the direct control of their own architects within the organization likely to benefit from the refactorings included in this book.

Refactoring to a Framework

I sometimes hear about how developers wisely wish to avoid a complete rewrite and instead want to refactor or migrate to a public framework. This sounds like the best of both worlds, combining an iterative approach with the developers' desire to use the hottest new technology.

My experience with legacy PHP applications has been that they are almost as resistant to framework integration as they are to unit testing. If the application was already in a state where its logic could be ported to a framework, there would be little need to port it in the first place.

However, by the time we have completed the refactorings in this book, the application is very likely to be in a state that will be much more amenable to a public framework migration. Whether the developers will still want to do so is another matter.

 

Review and next steps


At this point, we have realized that a rewrite, while appealing, is a dangerous approach. An iterative refactoring approach sounds a lot more like actual work, but has the benefit of being achievable and realistic.

The next step is to prepare ourselves for the refactoring approach by getting some prerequisites out of the way. After that, we will proceed toward modernizing our legacy application in a series of relatively small steps, one step per chapter with each step broken down into an easy-to-follow process with answers to common questions.

Let's get started!

About the Author
  • Paul Jones

    Paul M. Jones is an internationally recognized PHP expert who has worked as everything from junior developer to VP of Engineering in all kinds of organizations (corporate, military, non-profit, educational, medical, and others). He blogs professionally at www.paul-m-jones.com and is a regular speaker at various PHP conferences. Paul's latest open-source project is Aura for PHP. Previously, he was the architect behind the Solar Framework, and was the creator of the Savant template system. He was a founding contributor to the Zend Framework (the DB, DB_Table, and View components), and has written a series of authoritative benchmarks on dynamic framework performance. Paul was one of the first elected members of the PEAR project. He is a voting member of the PHP Framework Interoperability Group, where he shepherded the PSR-1 Coding Standard and PSR-2 Coding Style recommendations, and was the primary author on the PSR-4 Autoloader recommendation. He was also a member of the Zend PHP 5.3 Certification education advisory board. In a previous career, Paul was an operations intelligence specialist for the US Air Force. In his spare time, he enjoys putting .308 holes in targets at 400 yards.

    Browse publications by this author
Modernizing Legacy Applications in PHP
Unlock this book and the full library FREE for 7 days
Start now