cfqueryparam: it's not just for security-- also, when NOT to use it
Posted by
Brad Wood
Jul 26, 2008 16:33:00 UTC
I generally don't care to write about topics that have already had the stuffing blogged out of them. However, I've still seen some of these questions floating around and I figure it never hurts to have more than one place on the 'net talking about it. The two things I wanted to cover real quick are how cfqueryparam may (or may not) enhance the performance of your queries. Also, when does criteria in a query NOT need a cfqueryparam.The security side of cfqueryparam has pretty well been beaten into the ground, but there are other really good reasons to use it. Some of the SQL here that I will show you is specific to MSSQL, but most of these principles are pretty generic.
The Setup
Every time a SQL statement is run on your database, your DBMS compiles the statement into an execution plan based on the indexes available to use and latest data statistics. The query plan basically states in what order each table or index will be processed and how the results will be gathered. A simple select has a pretty basic plan, a giant select of doom with 10 tables and derived tables can get pretty hairy. MS SQL stores the plan as an XML file which can be viewed in a nice graphical format in Enterprise Manager. It is important to note that an execution plan will ALWAYS be used. There's no way around it. Generating the plan can also be costly depending on the complexity of the code. For this reason, your DBMS will cache a plan once it is generated in memory for later use if the same query comes up again. To get a list of cached plans in memory for MSSQL 2005 ordered by the most used, run the following select:[code]SELECT cache_plan.objtype, cache_plan.size_in_bytes, cache_plan.cacheobjtype, cache_plan.usecounts, sql_text.text FROM sys.dm_exec_cached_plans as cache_plan outer apply sys.dm_exec_sql_text (cache_plan.plan_handle) as sql_text ORDER BY cache_plan.usecounts DESC [/code]
The Problem
When a query gets run, your SQL server looks for a cached plan that matches before it compiles a new one.- Cached plans have to match pretty much exactly to be used
- Your DBMS will only keep so many plans. The unused ones will be purged from memory
[code]SELECT order_id FROM orders WHERE order_id = 1[/code]
[code]SELECT order_id FROM orders WHERE order_id = 2[/code]
[code]SELECT order_id FROM orders WHERE order_id = 3[/code]
[code]SELECT order_id FROM orders WHERE order_id = 4[/code]
The Fix
Now, what if we had used cfqueryparam. Our code would have looked something like this:[code]<cfquery name="qry_order_id" datasource="foo"> SELECT order_id FROM orders WHERE order_id = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#url.order_id#"> </cfquery>[/code]All 100,000 page views would have used the exact same SQL statement because all your database sees is this:
[code]SELECT order_id FROM orders WHERE order_id = ?[/code]
- Imagine yow much processing time would have been saved by not compiling an extra 99,999 plans.
- Imagine how much memory would be saved by not storing an extra 99,999 plans!
The Catch (There's always one of these)
To be fair, I have to point out that not EVERY SQL statement will benefit from a generic, reusable, execution plan. My example above was pretty simple. Changes are there is an index on the order_id column and the query was a covered select that only required an index seek. The order_id column probably has equally disbursed data which can be arranged in ascending order very easily in your index. There are times though when your database will choose not to use a perfectly good index. If an index scan will be required and the database is going to have to scan through most of the entire index, and then do some bookmark lookups, it will have a good chance of saying "Screw the index, I'm just going straight to the table." And in that case it may very well be faster. Especially if the table is fairly small. Let's imagine our orders table has an order_state column to represent the state that the order was placed in and there is an index on that column. Your company is based out of Missouri so 90% of the records in the table are Missouri. Let's say you are searching for orders by state:[code]<cfquery name="qry_orders_by_state" datasource="foo"> SELECT order_id, state, cust_name, invoice_num FROM orders WHERE order_state = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#form.order_state#"> </cfquery>[/code]If you search for all your Missouri orders (most of the table), a table scan might be the most efficient way to get all those records (becuase you are returning almost the entire table). If you are searching for California orders (which you RARELY have) then your order_state index would be much faster since you are only pulling a couple records. Now, if SQL server is re-using a generic execution plan from the very first query that was compiled today, it is possible that some of your queries would have been stuck with a plan that didn't really fit, but a new one was NOT compiled. The affects of this are real, but often negligible. I just want you to know the problem has the potential to exist.
Not Necessary
Also, one other quick note on cfqueryparam. Since we have established its main uses as:- Separating parameters from SQL logic to prevent arbitrary text from being confused with SQL and executed
- Making your SQL statement generic so oft changing pieces will not cause constant recompiling of the plan
[code]<cfquery name="qry_open_orders_by_state" datasource="foo"> SELECT order_id, state, cust_name, invoice_num FROM orders WHERE order_state = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#form.order_state#"> AND order_status = 'OPEN' </cfquery>[/code]... then it is NOT necessary to parameterize the order_status like so:
[code]<cfquery name="qry_open_orders_by_state" datasource="foo"> SELECT order_id, state, cust_name, invoice_num FROM orders WHERE order_state = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="#form.order_state#"> AND order_status = <cfqueryparam cfsqltype="CF_SQL_VARCHAR" value="OPEN"> </cfquery>[/code]That will work and it won't hurt anything, but it is unnecessary on your part since security and performance will not be affected by it. For more reading, Mark Kruger has some very nice articles over at ColdFusion Muse concerning execution plans. Check them out. http://www.coldfusionmuse.com/index.cfm/2005/6/28/cfqueryparam
charlie arehart
Great to see you write this, Brad. After seeing the flood of posts saying "always use CFQUERYPARAM". I wanted to write an entry like this myself. I was teaching this week so just didn't have time to write it up.
Yes, CFQUERYPARAM has its security benefit, and yes, it has its performance benefit most of the time, but we need to understand the potential negatives as well, like you pointed out, including how to recognize them, and how to mitigate them.
I'll note that I've made the same point (and shared means to identify and mitigate them) in a couple of talks I've given:
I may do my own blog entry with some of that info, because I agree with you that this needs to be said more widely. I realize it will sound heretical to some, and it also complicates the decision about how to best mitigate the security concerns, but the issues really need to be raised and better understood by folks.
Brad Wood
Thanks for the links, Charlie. Your site is a great resource.
Seth Feldkamp
Hmm... I'm a little late to this entry. I was wondering if you could elaborate on the differences (if any) between the caching of a stored procedures execution plan and a coldfusion query's execution plan. I had read somewhere that stored procedures offer a performance benefit over a query from the application server and I thought that this was due to the way it caches the execution plan. This entry makes me think that maybe I've misunderstood that.
Brad Wood
I'm hoping someone will correct me if I am wrong, but as far as MSSQL versions 2000 and 2005 I am fairly certain that there is absolutely no performance difference between a stored procedure and an ad-hoc query at face value. This quote from "SQL Server Books Online" says: http://msdn.microsoft.com/en-us/library/aa174792.aspx
SQL Server 2000 and SQL Server version 7.0 incorporate a number of changes to statement processing that extend many of the performance benefits of stored procedures to all SQL statements. SQL Server 2000 and SQL Server 7.0 do not save a partially compiled plan for stored procedures when they are created. A stored procedure is compiled at execution time, like any other Transact-SQL statement. SQL Server 2000 and SQL Server 7.0 retain execution plans for all SQL statements in the procedure cache, not just stored procedure execution plans.
What that means is stored procedures are NOT pre-compiled in MSSQL like some will tell you. They are compiled the first time they are run-- just like ad-hoc SQL. It also tells us that the execution plan for stored procs and ad-hoc SQL are cached in the same way.
This guy obviously feels strongly about the subject, but there is an interesting read here: http://weblogs.asp.net/fbouma/archive/2003/11/18/38178.aspx
The three main arguments I have heard that are mildly convincing to me personally are:
Seth Feldkamp
thanks Brad, I think I read that blog and the other one that inspired it. Maybe that's where my confusion came from.
Charlie Arehart
Not to counter what Brad says, but to respond to a different aspect of may be in Seth's question, about "I was wondering if you could elaborate on the differences (if any) between the caching of a stored procedures execution plan and a coldfusion query's execution plan". I know you were focused on whether there were performance differences, and as Brad notes, technically both do get compiled.
FWIW, I'll note that they do get compiled into separate caches, which can be significant (more on that in a moment). But I can also offer more information that backs up what Brad says (about how, internally, SPs and queries sent via CF can both at a low level in SQL Server end up being not too different.) I realize that does go against the grain of conventional wisdom.
Among the resources I mention in the links above is an awesome series of blog entries from members of the SQL Server team, and they have a LOT more information on all this sort of stuff related to query plan caching, both with respect to SPs, dynamic queries (as a CFQUERY select statement would make) and prepared statements (which is what a CFQUERYPARAM would cause). There are surprising things about how they're handled internally (much like Brad was alluding to in his original entry above).
I did an entry of my own that points to all of those entries in that series from the team. Consider it a table of contents: http://carehart.org/blog/client/index.cfm/2007/8/13/resources_on_sql_server_query_plan_cache
If you're interested in more about all this stuff, both how it worked in SQL Server 2000 and how it's changed in important ways in 2005, and subsequent SPs, this is all fascinating reading (but very thorough--not for reading on your ipod or phone).
As an example, the first of their entries discusses , at http://blogs.msdn.com/sqlprogrammability/archive/2007/01/08/plan-cache-concepts-explained.aspx, they start out making the point about the separate caching of SPs and dynamic SQL:
"There are two cache stores in which compiled plans are stored depending on the type of the compiled plan. If the query is dynamic sql or prepared, the compiled plan is stored in the SQL Plans (CACHESTORE_SQLCP) cache store. For modules like stored procedures, functions and triggers, the compiled plan is stored in Object Plans (CACHESTORE_OBJCP) cache store."
It and the other resources then go on to explain more about understanding plan caching, compilation, prepared statements, how to monitor them, optimize them, solve problems (like the ones Brad was alluding to) and much more.
I'll point out as well a couple other resources (that they link to), which are especially important about the original point Brad was making, of how CFQUERYPARAM doesn't ALWAYS make things better. As these explain, sometimes parameterized queries (or prepared statements as they're also sometimes referred to) may perform more poorly because of a confluence of unfortunate events:
http://blogs.msdn.com/queryoptteam/archive/2006/03/31/565991.aspx
http://www.microsoft.com/technet/prodtechnol/sql/2005/recomp.mspx
Again, the multipart series of entries I link to above offers still far more on all this. It's a deep subject, and proves once again that there are seldom easy, pat answers to solving a problem. It does make our job harder, but that's why they pay us the big bucks, right? :-)
Hope that helps some.
Brad Wood
Thanks a ton for the additional info Charlie!
Trevor
Still have some questions from the experts...
Does Val() protect all numeric values - (it's much cleaner than using cfqueryparam)
Does CreateODBCDate() protect dates in a where clause ? (also much cleaner)
Are there any security implications in not using <cfqueryparam> in an update query ? i.e. on the right side of a Set FieldName = 'Value'
We always use it because it handles quotes embedded in 'value', but I would like to know if SQL Server can be attacked through an update query in this way ?
Brad Wood
@Trevor: Good questions.
val() will most certainly provide protection around numeric values. Any malicious string passed into it will be silently converted to a number. (zero by default) It is important to note that val() will NOT raise an error, where cfqueryparam would. If invalid data is passed in, you may want to your error handling template to know about it so it can be logged and you notified.
CreateODBCDate() will stop malicious string from spilling over into your query. It will throw an error if the string passed in cannot be converted to a date.
Update queries are just as susceptible as select, insert, or delete queries. If you are outputting any variable into the SQL which is not scrubbed or simply bound as a parameter, nasty content could be imbedded there. For instance:
Update tbl_comments Set col1 = 'test', col2 = 4; drop table tbl_comments
I would caution you against walking the line of seeing when you can get away with not using cfqueryparam. If you make it a regular occurrence in your code I think it begins to look just as clean as anything else. Using it as a de facto standard means you never have to worry about if val() or CreateODBCDate() were enough. And of course as this post pointed out, you always get the potential performance benefit of a parameterized query.
Charlie Arehart
I'm a little confused: is that really Brad who wrote that last comment? :-)
You say, "I would caution you against walking the line of seeing when you can get away with not using cfqueryparam. If you make it a regular occurrence in your code I think it begins to look just as clean as anything else. Using it as a de facto standard means you never have to worry about if val() or CreateODBCDate() were enough. And of course as this post pointed out, you always get the potential performance benefit of a parameterized query."
But I thought the point pointed out that there was the corresponding potential detriment of using it, so that instead I'd have thought your reply might be, "I would caution you against just blatantly always using cfqueryparam."
You go on to say, "Using it as a de facto standard means you never have to worry about if val() or CreateODBCDate() were enough. " Well, yeah, but again doesn't that fly in the face of the very point of your entry above, which I was agreeing with in my first comment?
I mean, you go on to conclude with "You always get the potential performance benefit of a parameterized query." This is why I'm mystified. I thought you were standing up for the "be cautious about just always expecting a performance benefit". I just really thought you'd have added a more cautionary reply than this one.
Since people can say they're anyone on a blog entry, I'm almost tempted to wonder if that was really you, Brad. :-) Or have we somehow misunderstood what was your main point in the original blog entry here? :-)
BTW, I didn't myself offer any answer to Trevor because I just don't know the answer to his questions, which as you say are good. I've wondered them myself. I mean, especially if we want to be cautious about just blatantly always trusting CFQueryParam, then we need to know for sure what alternatives we can trust. I've not seen any good clarification on this, but I bet one exists somewhere. I hope someone may find and post it here. It feels like an alternative press for this contrarian perspective on CFQP, at least I thought it was until I read your last reply, that is. :-)
Brad Wood
Hey, who put that last post on my blog masquerading as me? Oh, wait, it was me-- never mind. :)
You're right Charlie, my response may have seemed a bit contradictory, but I'll admit I typed it rather hurriedly. I was actually walking past my computer dressed in my swimming suit with towel in hand about to go jump in our inflatable pool with my daughter when I saw the blog post, so I hammered something out pretty quickly.
I guess I should say that I have been teetering back and forth between a couple view points on cfqueryparam. My last paragraph probably shouldn't have referred to this post, as you are correct-- it largely addressed some reasons why CFQP could be a benefit OR detriment to performance. By the same rite, this post could also be taken as more proof that you should always use CFQP even if not for security. I'm never one to jump on bandwagons, and it's been hard to miss the large CFQP-wagon that's been lumbering through CF-Ville recently.
Despite the possible draw-backs of CFQP, which I always enjoy pointing out as a devil's advocate, I have been nearing the conclusion that all things being equal, one might be safest to make a steady habit of using parameterized queries and then address any other problems as they arise since it seems that the majority of cases will be fine with CFQP.
Functions like val() and CreateODBCDate() can be effective in protecting your code with out the use of CFQP, but I personally think it is easier to scan down a cfquery block and know it is secured when I have one consistent tag I am looking for. Using a combination of techniques might make it a bit harder to notice when you've missed something.
I suppose the majority of my response was a bit of a knee-jerk reaction to an attitude I've seen a on the talk list as of late that seems to go something like this: "I know CFQP will largely protect my parameters from SQL injection, but I have a lot of code and don't want to go through a lot of trouble. What's the smallest about of work I can do to protect myself?"
In other words, what can I get away with NOT doing? If I put this filter in Application.cfm does that mean I don't need to protect my actual queries?
That was really what I was trying to combat. Whatever method you use should obviously be thought through and carefully implemented. I'm not into silver bullets. The "always and never fairy" seem to constantly have their own caveats, but I guess I wanted to make sure the questions were real research and not the formulation of an excuse not to do due diligence to the security of one's code.
Charlie Arehart
Sure, and that's indeed wise counsel. Still, this is one of the few places where the counterpoint to be careful about CFQP is being stated, and I really think it bares stating. But, yes, I can see why some may prefer to default to "just always using it". I only mean that people need to be aware of the potential negatives, the very point of your post.
Hardly anyone is stating that--and I'd bet only a fraction of people reading blogs and lists are hearing this message. That's no slam against you. Heck, I know that my blog also reaches just a tiny percent of even the well-read audience. People just can't read every blog out there. It's just that for every 100 people crowing "always use CFQP", I bet only 1 might be savvy to the potential performance risk.
My only concern is that people be aware of it, and via the info I've shared in the resource links above, learn how to detect and address it. As you say, we have to stay on guard for the siren song of the "always and never fairy". :-)
Charlie Arehart
I think a statement I made there could be misconstrued. I said, "Heck, I know that my blog also reaches just a tiny percent of even the well-read audience. People just can't read every blog out there."
By "even the well-read audience", some may think I mean "they'd read mine" and not yours, but of course I didn't mean that. That's why I said "I know my blog also reaches just a tiny percent". I was addressing my own concern about the blogging world: many think that if something's put out there, people will know it. But only a tiny fraction of a percent of all CFers read any blogs, and of those many read only a few. We who are not in that few (and I count myself as one not in that few) can come up with great content but it may rarely be read. So instead "conventional wisdom" is about all that gets shared and repeated, without regard to important nuances.
It's really lamentable, and I've long been trying to think of a way to solve it. We have lots of great content being created, but it's across hundreds of blogs. Sure, we have aggregators to bring them all together, but you can't keep an eye on all they feed. Fortunately, we do have Kay and Steve highlighting a few entries per week (http://carehart.org/resourcelists/tools_to_consider/#cfbloghighlights), but that then is filtered on their interests (and only a few per week), though I think they do a great job. And some have pointed to http://coldfusion.dzone.com/ as an approach to highlight key posts, but it's just not taken off. The 15 things listed there go back to Feb 08, and there have certainly been more newsworthy entries since then (as Steve and Kay's entries show). Just a sad state of affairs, I feel.
Anyway, Brad, again, thanks for taking up the flag on this issue. Keep on doing what you do. :-)