SQL injection prevention
By: Bárbara Borges Ribeiro
INSTALLATION NOTE: You can get both
DBI from CRAN:
SQL injections are attacks to apps, in which the attacker exploits a security vulnerability to send whatever SQL statements they wish to the database. This concept was popularized in one funny xkcd comic:
In the Shiny app at the end of the previous article, there’s a security vulnerability because we paste
input$nrows right into a query:
If a malicious user manages to modify the input to accept something other than a number (possibly a string containing more SQL commands), we’d have no way to catch that and prevent it. In this case, the solution is very simple: just coerce
input$nrows into an integer and, to be extra safe, grab the first element only (in case the user passed in a vector with more than one element). The relevant command then becomes:
A more challenging example
In general, however, the solution won’t be so easy. As the comic above illustrates, most injection attacks are due to some clever manipulation of a string. To make this concrete, consider a slightly different app. Imagine that this contains highly sensitive information about each city. Each city knows its own ID (let’s assume these are complicated and encrypted IDs, not a simple sequence), but none of the other IDs. If you were a malicious user trying to access other cities’ information, you could try a brute-force attack. This would imply systematically checking all possible IDs until the correct ones are found. This would be hard enough (and take long enough) if you were just trying to access one of the cities. But what if you are really extra evil and want all the records? Take a look at the app:
The expected use of this app is that you will provide your city’s ID (ex:
5). Then, the app forms the query using your input (ex:
SELECT * FROM City WHERE ID = '5';). Finally, this query returns a single row of data corresponding to your city. But if you type
' OR 1 = 1 OR ' in the input box, the query that is formed is:
SELECT * FROM City WHERE ID = '' OR 1 = 1 OR '';. This returns the entire table, all 3427 rows. Congratulations – you’ve just performed a successful SQL injection! Let’s examine exactly what this query asked the database for. It is requesting all rows from the
City table, for which at least one of the following conditions apply:
IDfield is equal to the empty string;
- 1 = 1;
- the empty string.
Obviously, 1 = 1 is always true, so that condition applies to all rows in the table and the query returns them all. The first and third condition will always be false (there’s no city whose
ID is the empty string, and the empty string by itself returns
NULL), but that doesn’t matter because you only need one of these to be true. Those other conditions are only there to make the query semantically valid. The purpose of the first condition is simply to close the single quote, so that
1 = 1 is evaluated as SQL code (rather than being inside of the string literal, which would be useless for us). The purpose of the third condition is the reverse: to open the single quote, so that the final quote that is hardcoded in the app is matched.
Note that the trick we used the first time (coercing the input to the expected class) won’t work here. In fact, this is a semantically correct string, and it is meant to be a string. What we really want to do is to prevent the user from adding single quotes at will. While we cannot actually prevent the user from entering whatever they wish, we can (and should!) always sanitize their inputs. In this case, this means making sure that any potential single quotes are double escaped (i.e. add another single quote immediately after it).
With the latest release of
DBI, we have a new function
sqlInterpolate(), whose purpose is to safely interpolate values into an SQL string, therefore protecting you from injection attacks. If you substitute the
query line in the app above by the following two lines, and enter the same input as before (
' OR 1 = 1 OR '), you will get an empty table back (as you should):
This is because the query that is generated has changed from
SELECT * FROM City WHERE ID = '' OR 1 = 1 OR ''; to the SQL-injection safe
SELECT * FROM City WHERE ID = ''' OR 1 = 1 OR ''' ;. If you just enter a valid ID, however, the query stays the same (since there’s nothing to escape). Ex: In you enter
5, the query is
SELECT * FROM City WHERE ID = '5' ;.
sqlInterpolate() function takes two mandatory arguments (the connection and the SQL string x number of variables to interpolate) and x number of named values to interpolate into the query. In order for this to work, the SQL string must include placeholders for all the values that you want to interpolate: these are the variables, and each must start with a question mark (in the example above, our SQL string only contains one variable,
?id). Then you must specify the value of each variable as a separate argument to
sqlInterpolate() (in our example,
id = input$ID). The function will sanitize all the values and then place them in the appropriate place in the original SQL string. The result of this operation becomes the actual SQL query used later on.
Imagine that you want to let the user select three cities – you just just add more variables (and the corresponding values) to interpolate:
As is, this app produces the query
SELECT * FROM City WHERE ID = '5' OR ID = '' OR ID = '';, which returns just the row corresponding to
ID = '5'. If you add, a second valid
10), then it will produce the query
SELECT * FROM City WHERE ID = '5' OR ID = '10' OR ID = '';, and returns a table with those two rows.
In summary, you should always sanitize your user-provided inputs. If they’re numbers, coerce them to the integer or the numeric class. If they’re strings that go into a SQL query, use
sqlInterpolate(). If it’s something more complicated, make sure you process it in a way such that a SQL injection is impossible.
We love it when R users help each other, but RStudio does not monitor or answer the comments in this thread. If you'd like to get specific help, we recommend the RStudio Community as well as the Shiny Discussion Forum for in depth discussion of Shiny related questions and How to get help article for a list of the best ways to get help with R code.comments powered by Disqus