Does OOP Encourage RBAR?
Posted by
Brad Wood
Aug 17, 2008 03:56:00 UTC
I've been staring at my computer monitor trying start this paragraph for about 10 minutes now, so I'm just going to start typing. I've been thinking about system design. Organization versus performance. Design patterns versus efficiency. We adopt some code without thought to its performance because we're talking about milliseconds, and the readability, organization, and structure gains our apps enjoy are well worth it. I'm not sure that is always the case though, and we don't notice it until too late.When building something, I like to periodically step back from my work and think about what I have created. What is it doing? How straight of a line is it stringing between my two points?
In one particularly large app I worked on, we modularized things to the Nth degree. Dozens of individual include files existed that displayed a small set of related data. In order to make them all self contained, they ran all the queries necessary to display their data. You could include any file on any screen and it would work. After some time, we took a break to examine some performance aspects of our app. We found some pages running many database calls that ideally would have been combined into a single trip to the database, or worse, the exact same SQL being run multiple times in different modules.
From a holistic perspective all the data on a given screen could have been extracted in just a small number of efficient database calls, but given the dynamic nature of many pages, it would have been very difficult to maintain the self sufficiency and agnostic nature of each module. Our ultimate solution was to find the most used queries and employ a short lived caching mechanism to cut down on the repetitive calls in the same page. It's like we had cursed ourselves. The beauty AND downfall of our code was that the right hand didn't know what the left hand was doing. Had we created ultimate flexibility, or needless redundancy?
So, let's get back around to my title, "Does OOP Encourage RBAR?"
If you don't know what OOP stands for, you've been living under a rock. You may have heard of RBAR, or Row By Agonizing Row. The term describes when you tackle a problem one record at a time in a very procedural way instead of dealing with it in a set-based fashion. Let's say you need to take all open orders that are more than 6 months old and cancel them. The most procedural way you could accomplish that would be to query the entire contents of the orders table, loop over every record one at a time, and if the created data was old enough run a single update statement for each order to change the status one record at a time. The set based way to accomplish this would be a single update statement that updated the order status column with a where clause that filtered on open orders older than 6 months.
But your business logic doesn't belong in the database though. No, you deal only with objects. You ask them to do the work for you. You have an order component with a getOrderDate() method and a setOrderStatus() method. Our good coding conventions have taught us to never directly access the database from all over our app, but to encapsulate all that code in a single, reusable component. Now, additional logic such as archiving off data for non-active orders and updating the last tracking point on the order can all be taken care of by the singular piece of business logic in our component-- our perfect little world of code that knows exactly how to cancel a single order.
That's the problem though-- our component only handles one order at a time. Now, we need to cancel 300,000 orders and suddenly it is taking forever. We are opening and committing more transactions that we really need to. Our audit triggers are being run 300,000 times with a single record, instead of once with 300,000 records. Our transaction logs are filling up faster. We're bugging our database 300,000 times when we could just tell it something once and affect the same set of data. We could create some sort of order maintenance component which would have the capability of updating multiple orders at once, but now it must be endowed with all the logic that the order component has-- archiving, last tracking points, etc. That would be duplicated logic though.
So what's the solution to this? I'm certainly not of the mind to write off OO design patterns. I just keep finding this disconnect that seems to happen between my perfect little objects on the web server, that somehow at the end of the day need to find their way back to the database; a set-based storage system that most always performs best when dealing with your data as a set-- not Row By Agonizing Row.
Perhaps you could have a DAO of sorts that would accept multiple instances of an object and find the most efficient way to store them all at once. Maybe most people don't care if they make 300,000 separate updates, instead of one. Maybe ORM design patterns help solve some of this. (I'm not too familiar with all that yet though.) Open my eyes, please. Tell me how you deal with this.
Comments are currently closed
marc esher
hey brad,
if i ever write code that creates 300,000 objects and asks them to update themselves when a single sql update would do, take away my keyboard and give me a walmart smock.
i can't imagine any sensible, experienced system designer suggesting that you'd use objects in this manner. has someone actually told you that this kind of thing would be preferred?
Brad Wood
@Marc: I would hope you wouldn't write that code, but where do you draw the line? The knee jerk reaction of every OOP guide you see appears to be "create a single object that models the behavior of your entity". Just because you need to ability to deal with a single instance of an object doesn't preclude the possibility of needing to deal with multiple instances of it in the future. You may even start out building a page which only deals with a few objects, but as your data grows, it may be doing more than you ever expected. What do you do to reconcile that? What is this? Use OOP, until it doesn't make sense anymore and then switch over to just duplicating the code elsewhere?
I'm really just asking the question because I don't know. :)
Brian Swartzfager
Brad, I understand where you're coming from. There's no shortage of blog posts on creating and managing bean-like objects, but nowhere near as much discussion about the best "OO" way of handling multi-record transactions.
My sense of it is that multi-record insets, updates, and deletes usually end up in the Gateway object along with the select queries you need. There is no question that a cfquery statement is a far more quick and efficient way of affecting multiple records in the same table than looping through a set of objects.
Dan Wilson
Perhaps the question is less about OOP vs RBAR and more about designing the right object.
In your example, it seems like the 300,000 records are all part of simplistic criteria. This would be handled easily enough in a single query stuffed in an object.
On the other hand, I can see where it might be such that the logic that decides which orders require an update might be encapsulated in an object. If this were the case, and the potential to update 300,000 records was evident, I would create a mechanism to pool the record markers for each record needing update and then do the actual update in a single statement.
OOP is all about creating modular parts that function smoothy.
DW
Peter Bell
I think it is important to either have first class domain conceots for collections or to use something like a service class to handle this sort of work. I'd have an OrderService.updateOldOrders() that'd call OrderDAO.updateOldOrders() which would have a single SQL query for handling this as a set based operation on a collection.
OO design patterns are a great set of tools for solving a wide range of problems. All other things being equal a good OO design will often sacrifice some performance for maintainability. Given the relative costs of hardware and programmers that is usually a good trade off. That said, when you learn OO that doesn't mean you should throw common sense out the window. Look at the design forces for a specific use case and select the most appropriate patterns for solving the problem - whether it be a trigger in the db, a stored procedure, a set based method in a DAO or Gateway or a collection of row-by-row operations.
Brad Wood
Thanks for the thoughts guys. I don't think there is a clear-cut answer for many programming questions so I enjoy hearing other people's views.
I not sure how much I like the concept of the maintenance function existing with a large update stuffed into it. I don't know if that is really OO, or just trying hard to look like it. I have come to the conclusion that using objects does not necessarily mean your programming is object oriented. In fact, I have seen some very procedural Java code which "uses objects". Regardless, my main concern is just the duplication of code. I touched on additional processing that might need to happen like archiving and such. I don't like to have code in two places that does the same things. To combat that, I have found myself writing procedures capable of handling 1 OR MORE records being inserted or updated. Then the same proc can handle a single update with the same code it handles 10 updates with. MSSQL's XML capabilities can be really handy for passing in a "set" of data in a single call.
As far as triggers, I don't know if I like them for the most part on a matter of principle. (There are some places to do like to use them-- like audit tables) Maybe I think I've just seen them abused. I've seen a sort of lowest-common-denominator develop in databases where certain large amounts of business logic in a lower tier of your application sort of invites (or requires) other logic to that level. Let's say you E-mail outstanding account payables to accounting every time you cancel the order. Your DBA's first solution might be to place a trigger on the orders table and send the E-mail from there so you can be guaranteed it always happens. Immediately problems arise though. Now you have no immediate way of firing that event from a higher tier in your app based off of other criteria, your trigger does not have access to any other business logic written in a CF or Java layer, and your Database is sending out E-mails. Ewww. (I've also come to the conclusion that "just because you can doesn't always mean you should") The whole mess of logic just sounds like something the business objects should deal with, not the database. I guess I'm a bit of an idealist though.
At any rate, I kind of like Dan's idea to have a mechanism that keeps track of the data needing updated and updating it all at once. It could get a little complicated though-- I'll have to mull over it a bit.
I also agree with you Peter that there is always a trade off and you should stick with what makes the most sense in your situation. That's why I like to hash over these scenarios and get ideas so I better armed to code smartly and efficiently. Thanks again guys.
Jeff Moden
>>(Brad Wood said in the article) But your business logic doesn't belong in the database though. No, you deal only with objects.
That's a very common misconception. Business logic is the nebulous layer of code that straddles both the Presentation Layer and the Data Layer. Think of "constraints" on a table to ensure NULL values aren't entered or Foreign Keys that guarantee that some other item already exists prior to entry of a new item somewhere. Patently, that's business logic in the Data Layer.
A requirement to completely fill out one screen before proceeding to the next screen is a business requirement enforced by the Presentation Layer.
There is no limit on where Business Logic should actually live. You have to put the Business Logic where it will be most effective.
>>(Dan Wilson said) Perhaps the question is less about OOP vs RBAR and more about designing the right object.
Precisely and well said. Lot's of people forget that there are two types of work that may need to be kicked-off by the Presentation Layer... things that are necessarily "Record" oriented and things that are "Column" or "Set" oriented. If it's "Set" oriented, then there should be an "object" to handle the work on the "Set" and that “object†should be call to a stored procedure.