Training, Open Source computer languages
PerlPHPPythonMySQLApache / TomcatTclRubyJavaC and C++LinuxCSS 
Search for:
Home Accessibility Courses Diary The Mouth Forum Resources Site Map About Us Contact
 
NEW SERVER - 11.9.2020 ... Retiring, March 2020 - sorry, you have missed our final public course.
The Coronavirus situation has lead us to suspend public training - which was on the cards anyway, with no plans to resume

Please ask about private 'maintenance' training for Python, Tcl, Perl, PHP, Lua, etc
Happily continuing private consultancy / programming work
 
store file names or store files themselves?

Posted by that_guy (that_guy), 28 July 2003
Hi. I have a really loosely defined aglomeration of data- the users want to run various experiments on a network of machines- such an experiment might imply creating various vlans, running certain programs on certain machines in a certain order, and most importantly wrt to MySql, recording results/output of of these programs. I'm having a heck of time formulating tables for this - but one of the first things I'm trying to decide: if I need to store the output of say tcpdump, does it make sense to store it in the database as a large text object, or to tell expect, which is managing all these processes to specify a log file, and then keep the location/name of that log file in the database. Or a completely different approach and save any sort of textual output in CVS.  Long story short , does it make sense to store potentially large program output directly in a database, or is better to keep a reference to a log file?

Posted by admin (Graham Ellis), 29 July 2003
You ask a very good question indeed - and the design of the underlying data structure behind a system or application  is vital.

Codd's 12 rules for a truely relational database include specifications that all data should be presented in the form of tables, and accessed (from them) without ambiguity.  Read through them (see the link above) and you'll see that he 's coming very much down in favour of keeping the data in tables.  But then, he's talking about databases so perhaps that's a "given" to start with, and his "rules" are more like "ideal guidelines" really ...

The question "do I store all the data in tables or provide tables of file names" is very similar to the question that web designers ask themselves - "Do I provide a link to an external page, or do I copy the data into my page"?   The first crucial question to ask yourself is "what happens if the data changes?"  If the data changes, do you want to be responsible for the software / managing it in/out of the tables, or might that be a totally independent action where your tables are simply going to maintain link resources?

Managing all the data in a table:
* You can ensure data locking and data integrity
* All updates must be done by SQL / your routines
* Data is going to have to be checked in when its updated

Maintaining a table of links to files
* Users can change their data offering very easily and without your knowledge
* Users don't have to remember to check in changes but they do have to remember to check in the original
* Data searching in the files linked to may not be possible, and certainly won't be possible through SQL
* You'll need to have code to deal with missing files (orphan links)
* Users can keep multiple copies of a file and not realise which one the tables reference

My own view?  If there's anything like a serious number of experiment results / if you want to keep additional inforation about each experiment (e.g. have the results been analysed, who did it, etc), the I would suggest that you keep it all in a relational database.   It may be more work to set up, but the extra work will be repayed many times over in lowering support costs and once they know it the users will have a consistent interface.

You might want to have experiments output to a file initially, and then have their results checked in to the database - say you had to abort a long experiment halfway through (e.g. a sound test and a plane flew over) you might not want to even store the fact that the experiment was started.   This approach also means that long experiments won't leave long inserting sessions running which will help with users making parallel selects.

You ask also about alternatives such as CVS.  That's a version system; are you running a series of experiments that move on from one version to another, each of which updates the information stored for the previous experiment?  Thought not   ... then perhaps not CVS.

XML is another possibility that rears its head, especially for the data that you're having trouble formulating into tables.  This one might be more appropriate than CVS;  you might still want to keep simple (but perhaps large) tables for the experiment management information, but then use a markup such as XML within your experiment results, held within SQL tables as a large text or blob field.  If someone says "I'm having a heck of a job formulating tables", then perhaps a markup langauge is better suited for that part of the data.

Hope that gives you some starter thoughts ....

Posted by that_guy (that_guy), 29 July 2003
thanks, for the response- you're the most thorough forum administrator I've yet seen I will have to give this some thought, unfortunately, there are several complicating factors. The main constraint being time- I have approximately 2 weeks, and the scope of the problem is only weakly delineated.  A little context for why I even mentioned CVS was that a group of researchers may be collaboratively building/revising an experiment, if an experiment was encapsulated as a formatted flat file(XML) of configuration parameters as you suggest, then CVS could potentially allow them to track when they did what and roll back to a version of the configs that offered good results, but I agree this falls short of what  a database can offer.   The rough sketch of what I've done so far with this problem is make a tk gui with callbacks to some expect scripts to gather network settings and make config changes. I wonder whether something like tcl's Scotty would'nt be a better approach, but I'm not sure I'll have time to find/learn the best tool for the job. The config changes made with this gui are initially held in tcl arrays, committed to a mysql database when the user saves, and sent to the actual hardware when they opt to launch an experiment. I've got a table of {experiment_name, vlan_name, port#},  a table of cable configs that maps port# to hostname/interface, and a table that stores {program_name,program_parameters, host, experiment_name,universal seq id} The seq id is there to make sure all the programs run in a certain order. Do you have any feelings on Scotty/tnm/tkined? It looks as though development for them stopped a few years ago.
(I'll repost this question to the tcl/tk forum to be more orderly.)
~


Posted by admin (Graham Ellis), 29 July 2003
With the limited time you have, I would tend to keep it simple.  In an ideal world, you would define all the tables and columns that  would be needed in the longer term but as you haven't got that time, do what you can.  Important note - be careful to keep the data carefully structured.  That way you (or someone else) could come back later and add columns / restructure tables and the data (which rapidly becomes the biggest investment) would still be useable.

In the short term, perhaps it doesn't much matter what tool / language you use, as MySQL clients can be written in a wide variety of languages and data that's been added to the database from a client in one language can easily be accessed by clients in another language.

Scotty and the rest ... Hmm ... I've wondered too, but I'm on a little thinner ice here - done a lot of Tcl / Tk / Expect, and also a lot of MySQL - but the MySQL has been from Perl, PHP, Java and C, and I'm away from base this week without much of my Tcl resource.   You're using Tcl because you need expect, are you?  If I was looking for a quick fix, I might uses expect to drive the client mysql program and get in that way, or look at building the C API into my own Tcl command, or use a raw socket in Tcl; others will be horrified by the suggestion, but I only make it because I know those areas and know a lot could be done in two weeks - have you any experience in any of those areas by any change ?

P.S.  If you use expect to talk to MySQL at the same time as your other processes, remember you'll need to save and restore the spawn_id variable - its very non-OO



This page is a thread posted to the opentalk forum at www.opentalk.org.uk and archived here for reference. To jump to the archive index please follow this link.

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2020: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 793803 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho