Your PHP website - how to factor and refactor to reduce growing pains

As your project grows ... what do you change? In an ideal world, you would know exactly what you were coding before you started, and write the full job to spec to last for many years. This isn't an ideal world, though. Our web site has changed over the years - we now have "version 8" (See [here] to see some older versions), and although there are elements that do not change, it's now really a different sit eto it was at the start.

Of course, it can never be an ideal world - we simply couldn't do in 2001 what we can do in 2011. And had we waited, we would have done nothing - and we would now be planning a site that was fit to be unaltered up to 2021. So work has to be done and then updated - much of the trick is in thinking far enough ahead to reduce that updating, and coding in a way that makes your work condusive to such changes. This is great stuff to say, but you don't necesarily know all the tricks from Day 1, nor is it practical to apply them all (for some will cost you development time for later return) in the early days. And I'm also an advocate of spike solutions - writing some experimental code to see if something wil work, in the full knowledge that you'll simply learn from it and the regress and rewrite it from the strength of that knowledge. The "emailing your server" example I recently wrote - [here] - is a classic example of a spike solution; the proof of concept and exploration of initial alorithms is good, but it now needs to take on consideration of other MIME types, pick up attachment names and have a good layer of security added to it to stop people sending me emails that ~~will~~ would get added to my databases.

So - what SHOULD you be looking at and considering as your project grows?

Refactoring to Objects

Separating out the code that relates to particular data types, so that it can have its own test harness, so that it can be re-used across multiple related programs, so that you can deal with multiple instances easily, so that calls to your data type can easily be sanitised at the caller interface (API), and so that you can have a whole series of similar types of data (e.g. bank accounts, stocks, bonds, futures ...) based on a common set of logic, with separate classes defining what changes in each case from the common logic.

Dettaching your (My)SQL

Your code will be written in Perl / Python / PHP / Ruby / C / C++ / Java / Lua / Tcl and changing that would be a major task - admitted. But almost all of these are very stable and well established bedrocks and I don't see them "going anywhere" away fast. However, many data stores are based on MySQL which has a far tighter licensing approach, and was already withdrawn (quite a time ago) from the base PHP distribution due to this. Now in the hands of a commercial company (as is one of the languages I mention), I have to question the ongoing modernness of the version that's free at the point of distribution, and future conditions on the onward use of that free distribution. Some open source code requires paid licenses for you to sell it on as part of a software product, other open source code has a "viral" license which means that you have to use the same license yourself.

If you're using MySQL - it's excellent for most jobs - stick with it. But separate out the calls into a separate layer - change your calls to mysql_summat or mysqli_summat into calls to sql_summat and then you can write a simple intermediate wrapper so that all your MySQL calls are in one place. You then have just one (wrapper) file to deal with if you want to change your code to another broadly equivalent database. More extreme alternatives include adding in a degree of functionallity to the wrapper so that calls to it from the main code can be much more generic, limiting the number of types of different calls to aid portability, and indeed using an intermediate code level (a database abstraction library / level) such as ODBC, AdoDB, OpenDBX, DBI:: and DBIx. Systems such as Django and Rails include code to dettach the database from the application, and indeed to provide the additional validation of relationships and data intergrity that you want but doesn't always come with a database and direct calls.

Using approaches such as these, you can make a far better tuned decision as to whether your data should be stored via a database server (such as MySQL, Oracle, PostgreSQL, MSSQL ...) or via a lighter structure within your code (SQLite, CSV files, Access), or indeed in different ways on different installations of your system.

Image - is the client / server structure of MySQL always going to be right for you?

Status Variables

Have you ever come across a conditional like this: "If there's enough petrol in the car for a 20 mile journey and we would like to go somewhere, or if there's no petrol but we need to go somewhere and we have the money for fuel and there is no public transport that will do OR if we have an urgent journey and either we have fuel of we have a way of buying it and we can't get a lift or it's not so close that we can walk anyway ....". Not easy to follow, is it?

Code that starts off simple develops complex conditions over time, and that makes it into a spider's web to test and would produce some quite horrid logic tables to check you've got everthing right ... but only for you to find at a later date that some obsure set of conditions isn't met correctly. And then a single change will spread, ripple like across a pond, and introduce new issues.

As you see your code moving towards this complexity of logic, start to divide the tests out and use status variables. You'll find that much of our code has $error = 0; or $aok = 1; at the top of the main logic, and then these variables get flipped whenever a test is made that will effect the validity of the final page results. It saves the need for a single truely horrid (and unmaintainable) check at the point at which a decision is made based on the presenece of an error, and it also means that you don't have to store all the variables needed to help with that decision right through to the final decision point.

Exceptionally, I'm reasonably happy with the use of a global variable for a few error statuses - though an error object class is a much more perfect approach. And once you're going down the error status / error object approach, you can use it to gather up error messages so that you can easily explain to your user which test it was that failed, rather than just giving him a "blue screen of death".

Commonallity Tables

Have you even seen code like this:

  $_SESSION[name] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['name'] : $_REQUEST['name'];

  $_SESSION[email] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['email'] : $_REQUEST['email'];

  $_SESSION[phone] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['phone_no'] : $_REQUEST['phone_no'];

  $_SESSION[course] = get_magic_quotes_gpc() ? stripslashes($_REQUEST['course'] : $_REQUEST['course'];

  etc

Oh - for goodness sake - write an array and a loop. The code will end up much shorter, and you'll be able to spot any breaks from the pattern very easily in the maintainance phase of the project. (Did you note that "phone_no" became "phone" in my example?)

Pagination

I wrote a system in 199x for worldwide time card completion by a team of specialists out in "the field". The team comprised a few dozen people, and there were 74 project / task codes. The system was a miracle solution for the company who used it, allowing bills to be sent out from HQ within a couple of days of each month end. In 200x (where "x" has the same value as previously - i.e. 10 years later), I received a complaint that the updating of the project / task codes - a monthly job for one of their admin staff - was getting very slow; it turned out they were up to 1035 project / task codes, giving lie to the "it will only grow a little" that had been told to me on the writing of the system. And updating was done by downloading the whole project/task file into an editor window on the browser and re-uploading.

There becomes a point where a pagination system needs to be built into code - and it's far better to anticipate that up front than later on. Later on, the amoount of work can seem quite disproportionate. And with a "page 1, page 2, page 3" type approach you can do OK ... when you get up to dozens of pages, you need to have sorting, filtering and searching options too. You then start looking at user configurability and favourites. See [here] for a paginated example of our blog ... with search box to let you search through titles too. Far better than having everything on one page, but still plenty of scope for improvement due to data size. The blog started off as a few weeks' experiment. And has been running since 2004 ...

Towards more Formalised MVC

Are your staff excellent programmers? Are they superb graphic designers? Do that understand the structure of the data really well? And are they well versed in the layout of website URLs, mod_rewrite and the like? If you can honestly answer "yes" to all those questions, for every member of your team, you truely have a team that's second to note. And probably a team that requires a really high salary, with gaps being very hard to fill if any one person were to leave. But chances are that even with your fantastic team, they'll be loosing time as they work on your web site if all the HTML is mixed in with the PHP code, the MySQL, the CSS, the JavaScript and the natural language (English?) content.

The "four layer model" - or "MVC technology" - can help you resolve this issue by uncoupling each of the elements. MVC is really just a fancy name for a design approach in which each of the elements is written in its own area, and with each area having its own expert / group of experts with a clearly defined interface between them. It means that you can have the world's best designers working on the look and feel of your pages, without them needing to understand any programming, without them having to 'lock' the code away from the programmers because they're working on files that include programs, and without effecting the functionallity by making changes. And it means that your programmers can be sorting out user's issues, developing new functions and algorithms without having to puch through a forest of style and HTML tags to do so. By separting out the "business logic" into its own area of the code, you can even use it for offline processing (batch / overnight stuff, to provide "web2" services, etc), and by separating out the look and feel into templates, you can change the template and give your website a new lick of paint - or a personalised paint job for different clients - very easily.

P.S. MVC = Model, View. Controller. Or ... how that data is structure and held, how the screen presentation is done, and how that data is moved from the model to the view.

Single Encompassing script

If you want to restock your freezer at Tesco, you don't just materialise there in the frozen food isle. You get a trolley, go in through the front door, and walk there. And then you don't just warp home, you pay, and then take the car or get the bus. So why do web applications so often have many different scripts, each of which has a URL that exposes it to the world and is tempting the user to bookmark it and return into the middle, and is encouraging the search engine to drop people off with you totally out of context?

It's far better to have everyting in the same front end wrapper. The fourth layer of the four level model, mentioned above, allows you to do this easily without letting your code out of hand. require, include, from and import type statements let that top level - sometimes know as the framework or scaffold - conditionally load code so that you don't have the whole thing in memory for every little user action. The front end wrapper can uniformly and just once in the code validate and route the user, without the need for all sorts of checks and traps in intermediate pages.

You'll note at the end of this section, I list a few things NOT to change and one is the public facing URLs - or at least don't remove the old ones. Am I being a hypocit here? No - I'm suggesting that the public facing URLs get commonly mapped into a single top level piece of code by mod_rewrite, and you pass in the name of the page that was called as if it's a parameter. We have whole systems full of URLs which all map to the same script ... for example:

http://www.wellho.net/share/potomaccrossing.html - Bridges over the Potomac
http://www.wellho.net/share/courseterms.html - Terms and conditions for training courses
http://www.wellho.net/share/riverside.html - Melksham's Riverside Walk

Three different URLs ... but internally all are routed to a single script, with "potomaccrossing", "courseterms" and "riverside" becoming a parameter as if it had been filled in on a form. The single script - in this case - then goes and checks with our databases for the record that matches the page name in one of its columns.

Images - three views, three different URLs, all the same script

Using Modules

I'm going to finish with an encouragement to you to use standard modules in your code where you can. Typically, they'll come in the form of objects as these have a tiny footprint and allow them to be easily tested on their own, easily learned, and easily integrated without conflicing with other function names.

By using standard modules, you're piggy-backing on other people's expertise. And these other people are typically the enthusiasts and experts. Modules distributed by Open Source tend to be updated from time to time, and indeed feedback to the originators helps them do so. And if some new technology / standard level comes along which needs support, chances are that they'll add it. And they'll do so quicker than your team could, and probably at no cost to you. All your team has to do is to drop in the new version of the component which - if the job's been done right - will be "plug and play" with extra features switched on by an extra option on a method call, or incoming data arriving in a new format.

Some modules ship with products ... others (typically those which are more niche / specialised) need to be downloaded separately. You'll find libraries for most languages, and install tools too. Ruby Gems, Perl's CPAN, Python's Pypi, Pear and Pecl for PHP. As part of the investigation for the previous article I wrote, I installed the imap module on our PHP server; that's far better than writing my own email decoder (though I do have the tools to do so). And we also use MagpieRSS and MaxMind - to pick up news feeds, and to identify IP addresses back to the country and town from which our users are browsing.

Image - using MaxMind to show where visitors to our web site arrive from. This is an analysis of all of yesterday's UK visitors.

And What do you NOT want to change?

* Ability to use old data
* URLs - especially ones that have lots of external inbound links
* Anything in a rush / for the sake of it

And in summary

This is a long article, inspired by a recent customer / tailored training course. It would be easy - far too easy - for me to come across as critical as I make all these points that concern existing code. But code goes through an evolution as do systems, and it's impossible to know early on WHICH are the areas that will effect any particular project.

The inspiration behind this article is web site, and PHP based.

If you're a newcomer to programming, we can teach you how to program in PHP from scratch - Learning to Program in PHP; if you've programmed before, but in another language (and perhaps not for the web), then PHP Programming would suit you better.

For delegates who already know the basics of PHP, our PHP Techniques course is an excellent second level course - it covers many of the aspects I have mentioned above, together with web2, searching, graphics and geographics and much more. A PHP revision start at the beginning of the course helps us dot "i"s and cross "t"s for our delegates, and during the course the tutor can also fill in any gaps which are revealed; many PHP programmers are self-taught and inevitably they'll have missed out on some tips and techniques.

Object Orientation is a big subject in its own right, and is worthy of a single day course on its own - Object Oriented PHP. For delegates who are learning PHP, and likely to be going straight into medium sized to larger projects, "OO" is a vital subject, so we run the OO day as an optional extra directly following the regular PHP beginner courses. But it can also be taken as a single day by delegates who have started on smaller projects, or perhaps projects which are large enough to benefit from the OO model but that hadn't been realised.

Finally, if you feel that a day or two of external code review could help, please ask me! Such work may provide you with some valuable thoughts and pointers. "This has paid for itself already" said a participant in such a session the week before last ... and that was before 10 a.m. on the first of two days!
(written 2011-09-24)

Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles

Q915 - Object Orientation and General technical topics - Principles of Model - View - Controller
  [687] Presentation, Business and Persistence layers in Perl and PHP - (2006-04-17)
  [2199] Improving the structure of your early PHP programs - (2009-05-25)
  [2612] The Model, View, Controller architecture (MVC) - what, why and how. - (2010-02-01)
  [3237] Using functions to keep look and feel apart from calculations - simple C example - (2011-04-09)
  [3624] Why do we need a Model, View, Controller architecture? - (2012-02-25)
  [3705] Django Training Courses - UK - (2012-04-23)
  [3919] What is a web framework? - (2012-11-10)
  [4010] Really Simple Rails - (2013-02-17)
  [4066] MVC and Frameworks - a lesson from first principles in PHP - (2013-04-19)
  [4114] Teaching CodeIgniter - MVC and PHP - (2013-06-12)
  [4320] An example of Model-View-Controller techniques in a Perl / CGI script - (2014-11-20)
  [4391] Refactoring Perl applications to give them a rosy future - (2015-01-11)
  [4527] Hello Flask world / Python web micro framework - (2015-10-11)
  [4641] Using an MVC structure - even without a formal framework - (2016-02-07)
  [4691] Real life PHP application using our course training MVC example - (2016-06-05)

Q907 - Object Orientation and General technical topics - Object Orientation: Design Techniques
  [80] OO - real benefits - (2004-10-09)
  [236] Tapping in on resources - (2005-03-05)
  [507] Introduction to Object Oriented Programming - (2005-11-27)
  [534] Design - one name, one action - (2005-12-19)
  [656] Think about your design even if you don't use full UML - (2006-03-24)
  [747] The Fag Packet Design Methodology - (2006-06-06)
  [831] Comparison of Object Oriented Philosophy - Python, Java, C++, Perl - (2006-08-13)
  [836] Build on what you already have with OO - (2006-08-17)
  [1047] Maintainable code - some positive advice - (2007-01-21)
  [1217] What are factory and singleton classes? - (2007-06-04)
  [1224] Object Relation Mapping (ORM) - (2007-06-09)
  [1435] Object Oriented Programming in Perl - Course - (2007-11-18)
  [1528] Object Oriented Tcl - (2008-02-02)
  [1538] Teaching Object Oriented Java with Students and Ice Cream - (2008-02-12)
  [2169] When should I use OO techniques? - (2009-05-11)
  [2170] Designing a heirarcy of classes - getting inheritance right - (2009-05-11)
  [2327] Planning! - (2009-08-08)
  [2380] Object Oriented programming - a practical design example - (2009-08-27)
  [2501] Simples - (2009-11-12)
  [2523] Plan your application before you start - (2009-12-02)
  [2717] The Multiple Inheritance Conundrum, interfaces and mixins - (2010-04-11)
  [2741] What is a factory? - (2010-04-26)
  [2747] Containment, Associative Objects, Inheritance, packages and modules - (2010-04-30)
  [2785] The Light bulb moment when people see how Object Orientation works in real use - (2010-05-28)
  [2865] Relationships between Java classes - inheritance, packaging and others - (2010-07-10)
  [2878] Program for reliability and efficiency - do not duplicate, but rather share and re-use - (2010-07-19)
  [2889] Should Python classes each be in their own file? - (2010-07-27)
  [2953] Turning an exercise into the real thing with extreme programming - (2010-09-11)
  [2977] What is a factory method and why use one? - Example in Ruby - (2010-09-30)
  [3063] Comments in and on Perl - a case for extreme OO programming - (2010-11-21)
  [3085] Object Oriented Programming for Structured Programmers - conversion training - (2010-12-14)
  [3260] Ruby - a training example that puts many language elements together to demonstrate the whole - (2011-04-23)
  [3607] Designing your application - using UML techniques - (2012-02-11)
  [3760] Why you should use objects even for short data manipulation programs in Ruby - (2012-06-10)
  [3763] Spike solutions and refactoring - a Python example - (2012-06-13)
  [3798] When you should use Object Orientation even in a short program - Python example - (2012-07-06)
  [3844] Rooms ready for guests - each time, every time, thanks to good system design - (2012-08-20)
  [3878] From Structured to Object Oriented Programming. - (2012-10-02)
  [3887] Inheritance, Composition and Associated objects - when to use which - Python example - (2012-10-10)
  [3928] Storing your intermediate data - what format should you you choose? - (2012-11-20)
  [3978] Teaching OO - how to avoid lots of window switching early on - (2013-01-17)
  [4098] Using object orientation for non-physical objects - (2013-05-22)
  [4374] Test driven development, and class design, from first principles (using C++) - (2014-12-30)
  [4430] The spirit of Java - delegating to classes - (2015-02-18)
  [4449] Spike solution, refactoring into encapsulated object methods - good design practise - (2015-03-05)
  [4628] Associative objects - one object within another. - (2016-01-20)

H401 - Some extra PHP modules
  [732] Where is a web site visitor browsing from - (2006-05-24)
  [2343] World Flags in your PHP pages - (2009-08-10)
  [2682] Adding extensions to PHP Open Source applications - callbacks - (2010-03-17)
  [3453] Reading and using emails including enclosures on your web server. - (2011-09-23)

H310 - PHP - Putting it all together
  [468] Stand alone PHP programs - (2005-10-18)
  [1716] Larger applications in PHP - (2008-07-22)
  [1754] Upgrade from PHP 4 to PHP 5 - the TRY issue - (2008-08-15)
  [1794] Refactoring - a PHP demo becomes a production page - (2008-09-12)
  [1840] Validating Credit Card Numbers - (2008-10-14)
  [1962] Index Card System for Game Characters in PHP - (2008-12-27)
  [2275] Debugging multipage (session based) PHP applications - (2009-07-09)
  [2635] A PHP example that lets your users edit content without HTML knowledge - (2010-02-14)
  [2931] Syncronise - software, trains, and buses. Please! - (2010-08-22)

H302 - PHP - MVC, 4 layer model and templating
  [1634] Kiss and Book - (2008-05-07)
  [1766] Diagrams to show you how - Tomcat, Java, PHP - (2008-08-22)
  [2174] Application design in PHP - multiple step processes - (2009-05-11)
  [2221] Adding a newsfeed for your users to a multipage PHP application - (2009-06-06)
  [3539] Separating program and artwork in PHP - easier maintainance, and better for the user - (2011-12-05)
  [3956] Zend / layout of MVC and other files in an example application (PHP) - (2012-12-16)
  [4314] PHP training - refreshed modern course, backed up by years of practical experience - (2014-11-16)

Back to
Reading and using emails including enclosures on your web server.

Previous and next
or
Horse's mouth home

Forward to
MySQL, MySQLi, PDO or something else - how best to talk to databases from PHP

Some other Articles

On this day ... one PHP script with three uses
Away to train - but still around by video for Melksham meetings
Stepping stones - early coding, and writing re-usable code quickly
MySQL, MySQLi, PDO or something else - how best to talk to databases from PHP
Your PHP website - how to factor and refactor to reduce growing pains
Which or ATOC - who reads train fares right?
Why would you want to use a Perl hash?
A threat in the post? Poor marketing practise from Smiletrain?
Apache Internal Dummy Connection - what is it and what should I do with it?