Data::Whatever

In my day job, I developed a module which makes access to complex Perl data structures easier. I couldn't find any modules on CPAN that did exactly what I wanted to do. I needed to expose relatively complex networks of Perl references to non-technical end users, and make it easy for them to understand the structure, and search within it. The consumer modules expose Solaris 10 fmd event telemetry information to Perl programs. After writing modules to gather information from the telemetry logs, I planned to provide filtering abilities, which boils down to searching for particular structure 'fingerprints' within the telemetry, based on paths through the Perl data structure. Rather than make users write (and comprehend) unwieldly paths like '$top->{key}[0]{key}{key}[0]', which have far too much punctuation for people that aren't used to programming, I wanted to have them write paths like 'key[0].key.key[0]'. This seemed to be much cleaner visually, and therefore easier to understand.

Now, it is not always the case that users filter using a complete path -- otherwise it would be easy to just do a search and replace on the cleaned up path string, turning it into a real Perl structure reference, then eval the result. However, I wanted to enable users to search on partial key paths -- rooted at the leaves of the structure. After a lot of whiteboarding and testing, I ended up with a module (internally known as Ops::Struct) which essentailly walked the entire reference network, looking for N matching paths, optionally searching scalar key values with a caller suplied regex. The algorithm I arrived at in the prototype was very complex -- honestly, I wrote it before I took CS106X at Stanford and truly learned how to design and analyze recursive and mutually recursive algorithms. It did take advantage of recursion, but didn't have anywhere near a simple base case -- the algorithm was ~150 lines of code. It worked, and has since stood the test of over a year of heavy production usage with no major issues. However, I've never been happy with it, or how maintainable it was. I eventually intended to clean this module up and release it on CPAN, as I have found it to be quite useful.

The other day, I was testing a new feature of a script which uses modules built on top of the Struct module. I noticed some strange behavior, and ended up finding a relatively obscure bug that forced me to directly handle an edge case in the code. Instead of just adding that edge case to the already complex function, I decided that there was no time like the present to put all of the thinking I have done on complex data handling to use. A week of work from home while nursing a cold provided the ideal opportunity to work out a better way of solving this problem.

The solution was surprisingly simple, taking advantage of mutual recursion to walk specific types of references -- the array ref walk function recurses into itself, or calls the hash ref walk function, and vice versa. This reduced most of the functionality of the original Struct module to about 30 lines of code, and I believe that once I get a chance to compare performance, will present a significant speedup. At the very least, the win is in reduced code complexity -- so far, the branch/condition test coverage is upwards of 95%, which would never have been possible with the old code. The new module also has a smaller memory footprint.

I plan to release the rewritten module to CPAN shortly, but I'm stuck on a name. At the moment, it's called Data::Easy, as it's an easy way of dealing with complex data structures. I'm also considering Data::Structure, Data::EasyKey, Data::Friendly -- but I can't decide. Maybe someone that's read this far can offer a suggestion. I do think it belongs in Data:: at the very least.

Categories

Pages

About this Entry

This page contains a single entry by Cory published on April 17, 2006 11:18 PM.

Insanely Absurd was the previous entry in this blog.

Hello, and Good Morning is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.