Stop saying input validation
Friday, August 3rd, 2007So it seems like almost everywhere you turn for advice about securing programs or resolving known security problems leads you to a ‘security guy’ telling you something along the lines of ‘well, you have to validate your inputs to prevent these kinds of issues’.
Perhaps I’ve heard it too many times or perhaps I’m just jaded, but I’m throwing the BS card. Of course, I’d never leave it at just that… I think I’ve got a pretty good case for why it’s BS.
Consider my favorite red-headed stepchild, cross-site scripting (XSS). The mechanics of this problem are simple: an application accepts some input data and then offers that data in the form of output back to a user without checking the content of the data along the way (this is the case for both reflected or stored XSS problems, fundamentally).
Now, consider a small flashback I’m about to have to a Computer Networking & Communications from my undergrad days. I know serial communications are sooo last week, but anyone remember putting together simple protocols to transfer data over a line? In a simple message-based protocol, you’d pick a few byte-values to represent a few control commands like ‘end of message’ or ‘close this channel down’. This seemed like a great plan until you tested it out and noticed that some messages were getting truncated in weird ways and occasionally the whole channel went down. If you didn’t just chalk it up to bit-gnomes and listened to the professor, what you learned was that since you intermixed the CONTROL channel with the DATA channel, your data was inadvertently being interpreted as control commands when the appropriate byte-values were present in the data being transferred. Hopefully, you then learned that to make the protocol reliable, you needed to have a mechanism to escape data that contained values that would be interpreted as control codes. How’d you implement the fix? Well, you certainly didn’t try to trace the origin and content of every byte that might enter a message. What you did was augment the send_message() function with logic to zip through the pending message and escape anything that was a control code and then you’d do the normal stuff of writing it to the wire.
Let’s pop back to reality and our XSS problem. The problem is that we’re mixing user DATA with our application’s CONTROL and allowing the user data to be interpreted as control commands (in this case, HTML and/or javascript elements that run in the victim’s browser). I can understand how input validation might help for common, pathological cases of XSS vulns, but in no way is it a complete or adequate solution because what’s really important (and the correct chokepoint to fix the issue at) is the point of output. Yeah, we need to ensure that every time we put content into the user-bound HTML stream we first encode it to be safe in that control channel/language.
This is not limited to XSS. SQL injection (or really any injection attack) is about taking input data and passing it with unchecked contents to a DB command (or any API with a control language of its own, e.g. LDAP queries). Again, input validation can help in simple cases, but you’ve gotta know a priori all the ways in which data might be used by an app and choose some kind of mutually safe set of characters to let through. Either that or encode the potentially unsafe characters so they don’t cause trouble somewhere down the line. Not very extensible, usable, or maintainable in many circumstances since it’s overly restricting and fragile to change.
Take for instance, a ‘Comments’ field in a web app. Many real-world applications really do need to allow users to use characters like ‘$’, ‘%’, ‘<’, or ‘>’ to represent things like money, percentages, and value comparisons, so I simply reject the idea of banning those characters because it’s unnecessary and indicative of misunderstanding of the real problem. Some folks (which shall remain nameless) have said, ‘well, if you need those characters, just HTML/URL encode them as part of the input validation process and you’re all set’. Well, now you’ve added another problem where you’ve gotta go decode and re-encode appropriate to any other output vectors. To supporters of this strategy I ask, how many loading dock foremen and warehouse employees do you know that would correctly interpret a printout containing ‘Boxes & packing must be <50lbs’? How about ‘Boxes%20%26%20packing%20must%20be%20%3C50lbs’?
Now, I’m not saying you should not do input validation. It adds great usability features and might limit impact of other programming mistakes. What I’m saying is that input validation alone isn’t enough. You’ve gotta have output encoding to truly solve it right. We need to ensure that architects & developers have a deeper understanding of what the problem really is in order for them to naturally build systems to these types of attacks.
