0

I have a task to develop a website that allows storing of data resources from other APIs and users can analyse and visualize this data. This data is sensitive and data leakage should be minimal.

I implemented data isolation through user row-level multi-tenancy [acts_as_tenant][1]. I could not implement Schema-level multi-tenancy because there are database dependencies that will not work well with schema-level.

Users are allowed to submit read-only queries and receive the results. Of course, I managed to parse the query strings and disallow dangerous keywords such as DROP, INSERT, etc.

The database stores sensitive resources, so for every query, we either filter the results user_unique_id = 'value' or append a WHERE or ON clause.

But with this solution, as queries get more complex it also becomes difficult to parse the string safely to guarantee data result isolation. And if someone tries hard enough they could find data leakage loopholes because it's not possible to handle all query combinations.

I have no experience managing large applications. Id running multiple databases, simply a database per user, a recommendable implementation? Since users will not be in control of the databases, I'll have to maintain the versioning and migrations and ensure they're all identical (which sounds a bit too (code) repetitive).

How would one approach such a situation?

def query
  sql = params[:query]
  inspect_sql(sql)
  
  @result = execute_statement(sql)
  render json: @result
end

private

def inspect_sql(sql)
  # Raise Exception for non read-only queries
end

# Execute query
def execute_statement(sql)
  begin
    results = ActiveRecord::Base.connection.execute(sql)
    return results.to_a unless results.blank?
    { success: false }
  rescue Exception
    { success: false, exception: true }
  end
end

1 Answer 1

2

This sentence:

And if someone tries hard enough they could find data leakage loopholes because it's not possible to handle all query combinations.

tells you that accepting SQL as user input will always be a security hole big enough to drive a freight train through. Instead, you need a rigid API that defines the query parameters, including any additional related data the client would like returned. There is no standard way for designing this. Valid and secure solutions can be:

  • Query string parameters, like ?title=X where you include a search on the "title" field if provided.

  • A JSON structure, like { "title": "X" } which follows the same logic as query string parameters.

Either way, user input should not be directly executed as SQL. These parameters should be used to build a prepared statement that gets executed without concatenating SQL and user input.

select x from y where title = ?

I am reluctant to delegate this sort of thing to a stored procedure, because many times the SQL is dynamic. You need to be extra careful when executing dynamic SQL in most database vendors, because it provides another avenue for SQL injection attacks.

Start with designing a safe API that guards against SQL injection attacks. When SQL queries need to be dynamic, build the SQL query up, and use parameterized queries/prepared statements, which provide many safeguards against SQL injection attacks.

conditions = []
joins = []
params = []

if request.includes.foo
  joins << "foo on x = ?"
  params << request.includes.foo
end

if request.title
  conditions << 'title = ?'
  params << request.title
end

sql = # build sql based on conditions, joins, etc

results = Foo.where(sql, params)

Once you have this foundation, build on top of that to provide a rigid API that clients can call.

Accepting SQL as a string and checking it for malicious code is not a burden you want to bear.

Not the answer you're looking for? Browse other questions tagged or ask your own question.