Objects, multithreaded web servers and persistance
Posted by fastbloke (fastbloke), 7 January 2005Hi,
We run a three tier architecture. Content is stored in a db and is rendered by php on our app tier. Requests for this content are received by php running on the web tier. php on the web tier requests the rendered content from the app tier and returns html, xml etc.
I am trying to cache the content on the web tier to prevent lengthy round trips to the db. I would like the php on the web tier to 'ask' the php on the app tier if the content has changed - if it has, please give me the new copy of the content and store this copy locally - if not, return its locally cached copy as saved previously.
Content could be datetime stamped or given a unique id. Undecided as yet.
Having read through this page on persistant db connections - http://uk2.php.net/manual/en/features.persistent-connections.php , I wondered if it is possible to persist any object in the same way? I would like an object to contain the local copy of content and it datetime stamp/unique id. The web tier php would refer to this persistant object for local content and ask the app tier for any updates.
Can singletons achieve this? Are there alternatives I could look at?
Thanks in advance
Posted by admin (Graham Ellis), 9 January 2005The "conventional" way of persisting data is to use sessions and copy information into the $_SESSION super-global ... but I don't think that's quite what you're looking for as you're looking to persist data between different users, I think? You have some pieces of "hot" data that many users want to read all in a very short time frame.
a) Firstly, you SHOULD use MySQL persistant connections for efficiency - see the page you referred me to.
b) You could write a set of caching functions yourself, and have them use a local MySQL database engine on the web tier, perhaps with memory based HEAP tables. This probably sounds a frightening prospect, but I think it would be quicker than you might think to develop and would work pretty well. It'll certainly let your one request to the application tier be shared between all the threads on your web tier.
c) You could set up a MySQL replica server (read only) on your web tier. Then all read requests would be local, and only write requests would need to go back to the app tier. I understand this is the approach that slashdot takes - and they're talking (as I recall from Brian Aker's talk) about 500 hits per second.
d) What are the "metrics of slowness" of the access to the application tier? You might find that a bit of careful data denormalization could help - see an article I wrote last week in our solutions centre
EXCELLENT question, by the way. I don't think for a moment that I'll have provided a complete or ideal answer, but hopefully my thoughts might set you working our what is and isn't practical, and please do come "back at me" for more!
Posted by fastbloke (fastbloke), 9 January 2005Graham,
Thanks for getting back to me on this gnarly question!
I am trading the speed of returning a static html page against the reliability of holding content in memory. Holding content reliably and consistently on a shared disk system is bothersome - hence the in-memory caching. I want to avoid round trips to the DB if at all possible so I have implemented shared mem and semaphorss on the app and web tiers. The web tier retains a cached shared copy of the latest content using a serialised object. This is nice...but only half the solution.
On the app tier I would like to do something similar but with a twist. The object(s) in shared mem should be 'running' independantly - checking for content updates and maintaining an up to date copy of the content on the app tier ready to return new content to the web tier on request OR reply that content has not changed since the last request...do-able?
The ideal situation I am aiming for is where the app and db tier communicate only when content needs updating. The web and app tier may communicate often but will not require a further trip to the DB.
Hopefully thats not asking too much!!!
Posted by admin (Graham Ellis), 10 January 2005Thinking back - I'm recalling that you're using another database product (not MySQL) aren't you, so some of the options suggested may not be useable. Certainly if you have a replica server mechanism available and run a replica at the web level and / or a memory based table systems there, it will be such that the app tier will inform the web tier only when a change is made. Further, your database's query cache may save the rerunning of expensive queries.
I think the crux of the matter is that you need your Db tier to be procative in terms of telling your intermediate (cache) level that there's been a change. If your database's cache doesn't provide what you need, I would be tempted to have all Db reads call to the cache tier and return EITHER a previous query if the same request has been made in the last (say) 5 minutes, OR a new query if there wasn't an old one. The cache then to store queries for 5 minutes.
Database changes to be routed via the cache too, and to (a) update the data and (b) look through cached result sets for dependencies on the data that's just changed, and drop those cached result sets (meaning that a new request for that result, even within 5 minutes of a previous identical request) would cause a new SELECT to be run.
By all means use shared memoy for the intermediate level.
I think this gets pretty close to your "database access only when the data needs changing" ... not totally there, because it's also going to be called up once every 5 minutes for each query, or if the data effecting a query has changed.
An option within the scheme I've outlined would be to rerun all cached queries that will have been changed by a database change rather than just dropping them from your cache; it depends on th emetrics of your application whether or not this would aid or abet efficiency; I can foresee a lot of queries being run to update the cache which are then never called up from the cache.
Posted by fastbloke (fastbloke), 14 February 2005Graham,
I know its been a while since I touched this but we have come up with something quite juicy
First of all - correct, we are using Informix but the solution we have working is independan of DB, scheduler and file system...but is still UNIX specific.
We use a combination of Shared Mem, Init (Solaris specific), pcntl and db session id's.
The basic idea is that the php that generates content is 'fired' by a php daemon process (pcntl used to spawn children in a controlled manner based on ability to get a db session id and update a db table saying this procid owns this db sessionid - a kind of semaphore). This daemonized php periodically checks for new content on the app tier. It will tell (make an http call to the) php on the web tier to say 'come get content id's 3,5,7 and 9). The php on the web tier goes and grabs the freshly changed content and stores in shared mem (shmop) for later retrieval. Content is stored in an array in shared mem. Changes to content require a drop and recreate of the shared mem. One array element per content id.
Thus we can schedule content generation and store it independant of disk. Indeed the php daemonisation (great word huh?) should repurposed nicely for handling incoming feeds, handling queues etc etc
Posted by admin (Graham Ellis), 14 February 2005Good things take time to develop ... glad to read that you're making what sounds like an excellent solution there. I'm out of the office so only having the briefest of checkin - I'm going to come back and read more fully at the weekend.
And that IS a magnificent word you've coined ... I'll bet it gets sent to our spell checking expert as being unique on the site
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: firstname.lastname@example.org • WEB: http://www.wellho.net • SKYPE: wellho