January 25, 2015

Fixing damaged MySQL tables - Error 1712 and Error 2013

ERROR 1712 (HY000): Index smf_messages is corrupted

ERROR 2013 (HY000): Lost connection to MySQL server during queryA

mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table `smf_messages` at row: 135457

Oops ... I don't know what to say ...
"Don't Panic" in the words of Corporal Jones (Dad's Army) [here]
"I don't BELIEVE it" in the words of Victor Meldrew (One foot in the grave) [here]

For readers who are to young to remember those shows, or aren't from Great Britain ... something seems to have gone awfully wrong with the database. If I had been using MyIsam tables, mysqlcheck would identify the problem and call repair table to (try to ) fix the issue. But I wasn't - I was using Innodb tables.

Don't panic - provided you have a reasonably recent backup, and the table in question is write-mostly (which mine is - it's forum messages)

Method

a) Create a new table structure the same as the damaged table:
  mysql> create table newmessages like smf_messages;

b) Copy messages from the end of the damaged table into the new table:
  mysql> insert newmessages select * from smf_messages order by id_msg desc limit 20;

c) Backup the newly created (short) table:
  /usr/local/mysql/bin/mysqldump -h 127.0.0.1 -u fgw -p fgw newmessages > noloss.sql

d) delete the damaged table (having double checked you have a GOOD, COMPLETE backup!):
  mysql> drop table smf_messages;

e) Extract the data you want to restore out of your backups
(Edit copy of fgw.sql to contain only the inserts for the smf_messages)
  mysql> source fgw.sql.copy

f) Trim back the end of the update table so that there's no overlap, and install it too
(Edit noloss.sql to contain only the extra posts added)
  mysql> source noloss.sql.copy

And it should be "Robert's your parent's brother" - sorry - "Bob's your Uncle" [here].

Posted by gje at 04:44 PM | Comments (0)
More about Graham Ellis of Well House Consultants
Useful link: MySQL training
Related topics: via article database

January 24, 2015

Backup procedures - via backup server

Our main web server system becomes ever more important to our business - and having procedures in place to restore systems in the event of failure becomes ever more critical.

What are we protecting against? Both hardware and software failures, and against intrusions and obsolescence (to the extent of being unuable) too. And these come in different flavours - a hardware failure (in the shape of a power supply blowing) was quickly fixed by moving the disc into a new chassis / onto a new processor, whereas a software failure that's done laten damage a time back and just surfaced can take considerable fixing!

We have three phases of backups

Phase 1
a) On server, our most dynamic of databases are backed up every hour, and our less changing databases every six hours
b) On server, our whole web site and database areas, and our /usr/local, are backed up twice a week
c) On server, a monthly backup takes in just about everything thatcan change

Phase 2
a) Our every-six-hour backup gets pulled, twice a day, onto another server (phyically in a different country)
b) Our twice a week backup similarly gets pulled onto that other server

Phase 3
From time to time, we grab copies of the backups to our own inhouse server and / or laptops

We have stopped short of transaction logging, mirror servers, etc; almost all of our server content is pretty static and the mos dynamic - the First Great Western Passenger Forum - is a free forum without income, and users shouldn't (and don't) expect robustness an order of magnitude better than the services they talk about. We lost a couple of posts the other week, and the server was down for an hour or two. Course descriptions / Well House web site hanges are uploaded from our developemnt systems for the most part, so if we loose anything on the main server we can just replace it.

Phases 1 and 2 are performed by regular time jobs under crontab - with phase 1 being shell scripts (see [here] for hourly, and [here] for twice weekly

The Phase 2 script are programmed in Expect - an easy solution to grabbing files over the network, mimicing the command line rather than some of the complexities of setting up keys and keychains. Yes, I know it's less secure, but then virtualy of all the data is public stuff anyway. The expect script is [here].

The ultimate question with all of these peocedures is "if the live system fails at the worst possible time, and in the wort possible way, how do we get it back, how long does that take, and what is the cost - financial, to the business, and in time. And those question should be the starting point as you work out your backup strategy.

Posted by gje at 03:15 PM | Comments (0)
Related topics: via article database

January 23, 2015

Which (virtual) host was visited? Tuning Apache log files, and Python analysis

We host a number of domains on our main server, and in order to avoid fragmentation of log files, we keep a sinle composite log. Rather than use a standard logfile format henceforth, I've changed the second field to carry the virtual host name accessed for the request, as that was missing up until this morning.

So in my server config
  LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
has become
  LogFormat "%h %v %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined

Gone - Remote logname (from identd, if supplied).
Added - The canonical ServerName of the server serving the request.

I've written a program (in Python) to take a look at the log file - see [here] - and run from the command line that gives:
  -bash-4.1$ /home/wellho/trainee/y202/pytop
   13358 - www.wellho.net
    2719 - www.firstgreatwestern.info
     223 - www.melkshamchamber.org.uk
     100 - melksh.am
      62 - www.twcrp.org.uk
      39 - www.savethetrain.org.uk
      39 - www.across-the-pond.co.uk
      33 - www.wellhousemanor.co.uk
      30 - twhc.org.uk
      16 - transwilts.org.uk
       5 - thebutlerdidit.info
       1 - railcustomer.info
  -bash-4.1$

The program's also got a web wrapper - if called up on the web, it uses a different formatter:
  output = '{0:6d} - {1:s}'
  try:
    web = sys.argv[1] == "-w"
    if web: output = '<tr><td>{0:d}</td><td><a href="http://{1:s}" target="avh">{1:s}</a></td></tr>'
  except:
    pass

and later in my code:
  print output.format(counter[site],site)

And you can see the current results [here].

P.S. There's another quick demo web analysis program (showing its age) [here].

Posted by gje at 06:56 AM | Comments (0)
Useful link: Python training
Related topics: via article database

January 21, 2015

The unbalanced relationship between customer and provider

A friend describes how he returned from Paddington late one afternoon, when the station and trains were busy, and was delayed by two hours by staff who refused to accept his ticket, even though it was valid. He felt uncomfortable - threatened by staff who at one point even called in a "Gorilla" to threaten to eject him from the station, even though he was simply looking to exercise his right to travel on a ticket which I believe was completely valid - and indeed which they agreed was valid in the end.

Had the boot been on the other foot - had my friend blocked a member of the railway staff going about his valid legal business for two hours, my friend would have been in all sorts of trouble, facing arrest, sanctions, and possibly more serious consequences. But because it was the railway people who got it wrong, all they had to do was smile sweetly (and some of them did say "sorry") and that's regarded as matter closed.

There's a lack of symmetry in relationships like this - and that's a general feature of these whale and minnow / us and them / provider and user type relationships. Before Christmas, I picked up on a (I felt) rather drastic assumption by the Wiltshire Police, which I questioned. Their support officer appeared to be assuming guilt in certain cases - a very serious matter - and I question him on that. Numerous promises of a reply, a promise to come round and talk to me, but he's convenienetly neglected to do so and is probably hoping that I've forgotten the matter. I haven't, and I'm now wondering how he would he reacted if I had avoided answering his questions for a month - I suspect he would have been less than happy with me and would have taken sanctions.

I'm afraid that neither of these is isolated cases. I too have been in "trouble" for using valid tickets because (I'm assuming) the railway employee didn't properly understand the validity. And I have had a couple of other issues with the Wiltshire Police - which pale into insignificance behind a couple of other incidents in Melksham of which I know / where I have been on the peripehry in recent years. Each time ... I try to let time heal, to say "perhaps lessons have been learned and it's better now" ... but (alas) I'm mentioning current stuff / new incidents in recent weeks here.

Of course, it's very easy to have the boot on the other foot. An enquiry from another friend about an online forum (NOT ours) where the administrator has acted in quite a different manner to the way he requires his users to act had me thinking not only about the imbalance of rules / setups in those cases, but also about how we do things when we're running our own forum. His particular issue was with the other forum's administrator (a) Reading personal messages between members, (b) Commenting in a public post about a member he had banned in very uncomplimentary terms and (c) threatening to ban my friend too unless he responded sharpish to a question about what the wrote in a personal message.

Here's my "take" on the position of the site owner on a public forum

The agreement you "sign" when you join an online community such as a formal is typically unbalanced What you (as a user) agree to is not agreed to in equal and opposite measure by the operators of the forum unless they say so. For example, they may say "you may not advertise" but that's not to say they won't advertise to you. And they say "you may not invade someone's privacy" but that in itself does not mean that they can't do so.

As a forum administrator, it is very tempting to try to "control" your users. There are elements of where you have to do that - for example to stop signups / posts for the purpose of bulk unsolicited advertising of products which are completely irellevant to the membership and would dilute the forum beyond irritability to the point of unusability. And you also have to ensure that you (as the admin / owner) aren't liable for hosting copyright breaches, hateful material, indecent material, material which incites or assists in the commititng of criminal activity.

But how far do you go as an admin? There's a temptation to stop users expressing views you don't agree with, and to stop users from talking about sites which you may be considered to be competition. There's often an element of conflict here, with users who sign up (for free) having a general view that there's a "freedom of speech" on the internet, and that they should not be restricted. In my experience, it's wise to have stated policies for the administrator team, publish them, and keep them as unrestricting and trusting as possible of your regular / long term / genuine members. You'll find a lot of this in the welcome board on our forum.

I have been asked the question "how far, legally, can an admin go"? I'm no legal expert, but as stated above the rules that a user agrees to when he signs up must not be assumed to be equal and opposite in the other direction. In the UK, the admin needs to be aware of the cookie laws, of data protection / registration laws (which, however, may not apply to a "club" type site where everyone is a voluntary member). And whilst the standard forum software packages usually keep things like private messages (PMs) between members carefully, and avoid them being read widely (or indeed by the moderators and admins), I would be hard pushed to find any legal basis if it wasn't in the forum agreement that they must be kept private. Morally, I think it's wrong to read personal messages - but the temptation can be there for an insecure admin who fears that a member has signed up purely to contact members who are susceptible to be poached to a competitive site, and wants to know if such conversations are going on. At the end of the day, the admin can say "it's my site, my rules, and my interprettation of my rules. If you don't like it, you can leave".

I will add one caveat. We are all subject to laws such as defamation, discrimitation, equallity, harressment, copyright, decency, incitement, libel, commercial contract laws that we sign up to, etc. And that includes administrators. So a site owner can only post up that "Walter was banned because he slept with my girlfriend" or "Sheila has been banned for poaching people to a competitive site" if he can prove that in a court of law ... though in reallity it's pretty unlikely that either Sheila or Walter would take it that far, except on a point of principle. There *are* people who takes these things further on principle - we've come across four (to my recollection) over the years who have threatened us as admins over what has been posted. In two cases, the complaints were about copies of quotes from reputable newspapers, which we found amazing. In another, a retrospective copyright restriction was applied so "please help us by pubicising this" became "please take this down - you don't have our permission to still have it there". And in the final case, an individual felt he was close to the description of someone in a post, and that it was aimed at him. In all 4 cases, it should be noted, that the protagonist would not have had to use their personal funds to take the matter further (insurer, 2 x legal aid, 1 x public body on taxpayer's money). As admins in all four cases (and that's not a bad number with 166,000 posts!) we have simply removed the post or thread from pubic view. In three cases, we made it very clear that this was a pragmatic action - we felt that we had done nothing incorrect, but didn't have the time or inclination for a principled fight over something petty. In the final case, a court judgement against the paper we had quoted was brought to our attention, and we had a very friendly exchange with the person concered and sorted it on really good terms.

And in relation to the First Great Western Coffee Shop?

It's probably worth saying ...

a) The general ethos is that if you treat the moderator and admin team as you would wish to be treated, they'll treat you as a member or as a guest in the same way

b) For the most part, moderator and administration actions are positive - answering questions, welcoming member, helping keep topics in the right place, and so forth. Where something may necessite something more, it's discussed around the moderator team and if need be enquiries are made, without going public, unless immediate action is needed.

c) The forums is strong and it shares views, including those which are not held by the moderators, openly. And we don't mind people talking about other forums, services, etc. We do draw the line at advertising of unrelated products (especially by people who sign up just to advertise), and posts which facilitate law-breaking - such as those which provide information on fare evasion (mind you, such information can easily and accidentally be posted, and the issue is usually solved with a coulpe of messages rather than by waving a stick)

d) Personal messages are NOT read. There might (at some future date) be a time that we're legally required to divulge something, and there mignt be times where I in my technical role of database admin have to check data if we have s problem, and said messages might flash up on my screen as I sort things out / debug. It's never happened yet, though.

e) We try to be as open as possible with policies and decisions, making sure that there's published details of what actions are and may be taken, and why. Occasionally, such as when we're looking to reduce the number of "spam" signups (they can drown out genuine ones!) we'll not publish the algorithms.

f) We try to keep as much of the information on the forum as we can public - after all, we're a forum for passenger and we want casual / occasional visitors to be able to gain from the site with the minimal of requirements on them

g) We are free from advertising and signup fees. In the future, funding the service could become an issue, so I'm not saying "never", but any steps taken would be reluctant and measured.

Posted by gje at 10:33 PM | Comments (0)
Related topics: via article database

January 19, 2015

Finding sum, minimum, maximum and average in Python (and Ruby)

A fresh example (in Python) from today's "learning to program in" course ... finding the number, sum, minimum and maximum of a series of numbers typed in. See [here].

We start off with zero for the count and sum to date, but we do not initialise the minimum or the maximum as we don't know what they would be initialised to - instead, when we read the first value we store that into the minimum (so far) and maximum (so far) variables.

Note the contrast to the similar Ruby example ([here] in which we initialise the variables to nil - a slightly different strategy caused by different scoping rules.

In all cases, we also need to check for a "divide by zero" if there is no input data. My Python program from today does that, I note; the previous Ruby one doesn't have that in.

Posted by gje at 06:04 PM | Comments (0)
Useful links: Python training, Ruby training
Related topics: via article database

Selecting RECENT and POPULAR news and trends for your web site users

How do we give people recent news, or talk about recent trends or popular posts>?

The obvious answer is to take data for the last "n" days or months and analyse that. But whilst that's the obvious answer, it's also obviously a fairly crude measure as it gives undue influence as it changes to what happened at the distant (early) end of the period being analysed. For example, we hear about the "annual rate of inflation" and sometimes it's said to go down ... not because of something recent, but because a big rise in the price of [petrol/electricity/vegetables] has fallen off the equation from a year ago.

A far better solution is to run some sort of weighted measure where very recent events are given a significantly higher weight than old ones - and here's an example of how I've done this in a recent requirement to report on the currently most liked messages on our First Great Western Passenger Forum.

Coding in PHP:

  $scoreboard = array();
  $countdown = 50;
  $showrows = $countdown / 2;
  $r = mysql_query("select * from lykes order by timing desc limit $countdown");
  while (list($id,$post,$user,$when) = mysql_fetch_row($r)) {
    $scoreboard[$post] += ($countdown *= 0.92);
    # $scoreboard[$post] += $countdown--; /* Alternative to previous line */
  }

I have chosen to use three constants which work for this particular site at present:
• Consider most recent 50 likes
• Scale down the significance between each like to 92% of the previous one
• Display a maximum of half the number of articles which are in the window we've looked at
and while I could easily suggest more tuning (such as trimming article likes by date, differing factors and so forth) I've chosen to publish here with constants to make the algorithm easier to follow.


Click on image for most recent report!)

See the full code [here] (it turns out that the presentation code is far longer than the decaying average algorithm and try it out [here]. Find out about our PHP courses [here].

Posted by gje at 08:31 AM | Comments (0)
Related topics: via article database

Commenting out an echo killed my bash backup script

"Commenting out an echo killed my bash script" - yes, really!

Here (with password amended!) is part of my bash script which takes a database snapshot from time to time - and it's been running well over a 24 hour cycle, giving me checkpoints to return to. But it was getting a bit verbose, so I added a simple # in front of the successful emailing ine.

  /usr/local/mysql/bin/mysqldump -ufgw -psummathere -h127.0.0.1 fgw > /home/backups/$HOUR/fgw.sql
  if [ "$?" -eq 0 ]; then
    echo "Success $HOUR fgw" | mail -s "FGW good dump" graham@wellho.net
  else
    echo "Mysqldump encountered a problem $HOUR fgw" | mail -s "FGW BAD dump" graham@wellho.net
  fi

My change got rid of the good backup message which had been "spamming" me during testing ... but it also got rid of the rest of my backup procedure. Oops. It turns out that you're not allowed an empty block in an if!

Advise is to refactor the script - or to add a null statement into the block if you want to retain the ability to uncomment the echo and start emailing again (or whatever your programmer-switchable action is!). Suitable null statements are
  :
(yes, a sinle colon) or
  echo nothing > /dev/null
but please note I have not yet tested these. I'm more concerned, as I write this, to be handcranking a backup and watching the backup pots refill with complete data sets!

Posted by gje at 02:41 AM | Comments (0)
Related topics: via article database

Well House Consultants Ltd.
Copyright 2015