Long character string in a WHERE clause filter?
Posted by Patrick (Patrick), 20 April 2004Hi,
I was wondering if theres is some performance issues/problems or considerations when running this query:
SELECT ... FROM A
WHERE column_x = 'AEBDECEZ....AEC'
where 'AEBDECEZ....AEC' is a 150 character string, for exemple.
'AEBDECEZ....AEC' is a generated code corresponding to different choices made by the front-end user! Each different choices registered based on a questionary (A list of questions with different answers options) will generate a different string code!
I was also thinking to index the column_x to make the query faster!
Shoud I use a text format field to store my string code? Shouldn't be this better in a Binary format? Will an index on such a field be efficient as for numeric data type?
To optimally do this I would like to know if there is a well known technique or method to realize such generated code or system (different options to choose generates different optimized UNIQUE code)?
How can I optimize this system for fast retrieval/search/match?
Is this a performance issue to run such query based on a long character string filter (a generated code)? Should I consider something special to increase a performance aspect of such queries?
I heard that it is best to use numeric types but I am limited with the numeric width data type!
Thank you very much for your tips.
Posted by admin (Graham Ellis), 20 April 2004Couple of comments to start with
a) In MySQL, "Binary" simply means case sensitive. If you make your strings binary, it will speed the tests us a little as it won't have the extra testing to do to see if things match but have a different case.
b) You're right, efficiency wise, to be using an equality test rather a like test.
Now - some further questions for you. How much data do you have - hundreds, thoudands or millions of rows? How active will the database be - volumes of requests?
It might be that it's not really worth trying to optimise. And an index is a tradeoff of space v speed. Also - do you need the whole of the 150 character string ina single column? If it logically splits into two colums or more you might be able to write a more complex but quicker test that rapidly eliminates loads of unwanted rows.
Posted by Patrick (Patrick), 20 April 2004First: thank you very much for your advises!
My "problem" or model is this:
I have to modelize a Form containg x questions. For each questions I'll have several options (answers) possible. Some of the questions will accept just one answer others will accept one or more.
Each end-user (through his browser) will mark his own options based upon his choice or opinion.
My database will contain different forms as I will have different customers that want to send their own "questionary (form)" to their own customers (Front-end users).
My customer should then have some statistics generated on these choices. So, instead of generating a complicated and non-optimazed query based on several AND and OR clause in the WHERE clause for each combination of the options (WHERE optionA_from_question1 = 1 AND optionB_f_q2 = 1 etc... and make the same query for each possible combination ... ), I would like to generate a unique code based on each possible combination to have something like this when runing stat:
SELECT COUNT(*) FROM ... WHERE col_code = 'unique_code_string'
--> count the numbers of front-end user that make this special combination of options choice!
AND my flied containing these 'unique_code_string' will be indexed to speed up retreival.
.If, for exempel, I will have 1000 users, I will then have 1000 or less unique code generated (1000 if they have each one made a different choice) and I will have 1000 rows (one row per front-end user that has filled his form).
.If it's a one day campaign: I will assume that the database activity and the volume of request depends on how many front-end user the customer will have. At this point this may not be an issue as they are each filling up their respective forms and the web browser or the java-apache-web-server will generate a corresponding unique code for this specific combination of options choosed by the front-end user. It will be only at the end of the campaign that the customer will run statistics on several combination of options choosen by his front-end users (may be his own customers).
.An index is a tradeoff of space v speed: I won't have problem of space on my disks.
Is this system/concept a good practice or idea? Do you think I will expreience some problems and performance problems with such long strings? Should I use numeric values even if they are contained in a character string?
Do you have another idea or concept about how can I implement this system in another fashion or optimize it?
Thank you so much for your attention and tips!
PH: 01225 708225 • FAX: 01225 793803 • EMAIL: firstname.lastname@example.org • WEB: http://www.wellho.net • SKYPE: wellho