I wanted to drop in and explain my lengthy absence, especially given some interesting "part two" articles in my queue that several of you have emailed me hoping for an update. In particular there's a pending summary of all the feedback regarding "Why All the Hate", the long overdue "Blue Child" installment (regarding end to end development of a Perl Catalyst application) and on a lighter note, my decision regarding "Is it time to rotate the Profile Pic?" There's a few other items in my PENDING IRONMAN BLOGS directory as well.
Earlier this summer I was diagnosed with an early form or tendonitis, and I've been struggling to avoid surgery to repair it. That has limited my typing time substantially and although I've been using voice recognition to help pick up the slack, I can't use it on my train rides to and from work (it annoys the other passengers too much, similarly to what a loud mobile phone conversation might provoke). So I guess I'm out of the Ironman running, although I am cool with that since so many of you are stepping up to the job. I also recruited a new blogger, a Perl newbie, who is trying to record his learning Catalyst adventures.
Anyway, I'm not here to bitch about all that. I am really annoyed by the following issue regarding database design and that's what I am going to bitch about. My monumental annoyance is as follows:
Lookup tables.
I'm not talked about generic lookup tables (see this article for what's bad about those.) I'm talking about the kind most of us use all the time. For example, you have the logic "Each Address must have a State". As part of normalizing, you create Address and State tables, with a foreign key constraint between Address and State.
This seems natural but always gives me a queasy feeling. To me, they seem to be blending the boundary between schema definition and constraints with actual row data. This makes it hard without using triggers to have special constraints on certain types of rows in the look up. For example, California has a lot of special rules regarding pretty much everything, which results in tons of evil code like:
if($row->name eq 'CA') { ## do something special }
Careful design in your ORM can mitigate the issue. But I still don't like what seems to be a very hard solution to a problem we all must have several times a day.
I also don't like how in look up tables, the row data has to be part of the database design. DBIC makes it easy to deploy a DDL of a database based on the classes you define, but then you still need to go in and manually add a bunch of "insert into gender(name) values('female')" to the generated DDL file. This basically takes a nice and neat automated system and makes it manual again.
Plus right now we can't use DBIC Versioning to manage the differences in this row data. DBIC Versioning only handles DDL differences, which is as expected since you would imagine your actual data to be data, not logic or constraints. That means if you have changes in the lookups, you will need to add more inserts or updates to the DDL diff that DBIC makes for you. Also, if you need to do updates, this introduces possible pain regarding you primary key fields if you are using some sort of auto increment or uuid system to create these, since you might not know in advance the values in these fields.
It's been recommended to me that you can use an 'enum' type in the database instead of table look ups. That way you do indeed get to manage the data as part of the DDL, but you still end up having to use conditionals a lot for all the special rules.
At least with DBIC, you can mitigate this a bit by making classes and resultset constricted to a particular query (so you can make a Schema::Result(Set)::State::California that actually can hold the rules for California). This still feels a bit ugly to me since you still have to do all the manual table inserts.
Anyway, that's my Friday rant. Thoughts and suggests very welcomed. For such a common issue a commonly shared solution we can put into code would be fantastic.