We use cookies to ensure that we give you the best experience on our website. By continuing to use the website you agree for the use of cookies for better website performance and a personalized experience.

Integrating Apache Druid with Cube.js for Fine-Grained Access Control

Beata Zawiślak
.
June 20, 2024
Integrating Apache Druid with Cube.js for Fine-Grained Access Control
Beata Zawiślak
June 20, 2024
.
X MIN Read
June 20, 2024
.
X MIN Read
June 20, 2024
.
X MIN Read

This article explores how Apache Druid can be integrated with the Cube.js tool to implement fine-grained access control, enabling precise and secure data management in a multi-tenant environment.

Fine-Grained Access Control

Fine-grained access control (FGAC) is a mechanism that allows precise control over access to specific resources or actions within a system. Typically, coarse-grained access control operates at the level of entire applications or databases. Using FGAC enables more granular control at the level of individual data items, functionalities, or operations. 

In the context of databases, fine-grained access control allows administrators to define security policies that exactly define which users or with what roles are allowed to perform specific actions (such as read, write, update, or delete) on particular rows, columns, or sets of data within a database table. 

When implementing fine-grained access control, there are two approaches:

  • Row-level access control: This approach works by controlling access to individual rows in a table based on user attributes or membership in specific groups. Thanks to this, administrators can ensure that each user only sees data to which they have permission to access. 
  • Column-level access control: This approach involves restricting access to specific columns within a table based on user identity or role. For example, only selected users might be able to view sensitive columns containing salary or personal identification numbers while others not.

Fine-Grained Access Control in Druid

Defining FGAC directly in Druid is currently not possible. Right now Druid provides the possibility to set control accesses at the level of entire applications or databases. For more information, check out the Druid documentation here.

Row-Level Access Control

In this tutorial, we will focus on setting up row-level access control per tenant in the same database instance. Cube.js allows users to implement this in different ways. One strategy is to create separate data models per tenant. Another way is to rewrite the query by filtering the data before execution. This tutorial will demonstrate how to configure row-level security using the second approach.

Multi-Tenancy in Cube.js

Multi-tenancy is supported by default in Cube.js at both the database and data model levels. If you want to use multiple drivers, meaning that one of your customers has data in Postgres while others have data in MongoDB with one Cube instance, it is also offered out of the box.

Cube.js provides six configuration options to set up multi-tenancy. You can use just a few, depending on your needs:

  • ‘contextToAppId’
  • ‘contextToOrchestratorId’
  • ‘driverFactory’
  • ‘repositoryFactory’
  • ‘preAggregationsSchema’
  • ‘queryRewrite’

All the above options should be implemented in the ‘cube.js’ or ‘cube.py’ file. Check the Cube.js documentation for more detailed information on multi-tenancy options.

Also, Cube.js offers various configurations beyond FGAC, depending on your use case. Read more about multi-tenancy in the Cube Multitenancy | Cube Docs.

Integrating Druid with the Cube tool for Row-Level Access Control

Setting up Druid

1. Install Druid locally (in this tutorial, we use Apache Druid 29.0.1 release):

tar -xzf apache-druid-29.0.1-bin.tar.gz
cd apache-druid-29.0.1

2. Start-up Druid services:

./bin/start-druid

3. Load sample data:

To load sample data, use the step-by-step instructions in the Apache Druid Quickstart (local) tutorial.

During loading sample data, in the section ‘Transform data and configure schema/Transform’ add a new column transformation to create the ‘userID’ column.

Example:

userID column.

This column is only needed to provide a better understanding of row-level access control in Cube.

Configuring Cube

1. Create a dedicated folder for your project from the terminal:

mkdir cube-project
cd cube-project

2. Create the ‘docker-compose.yml’ file:

touch docker-compose.yml

‘docker-compose.yml’

version: "2.2"
 
services:
  cube:
    image: cubejs/cube:latest
    ports:
      - 4000:4000
      - 15432:15432
    environment:
      - CUBEJS_DEV_MODE=true
    volumes:
      - .:/cube/conf
    network_mode: "host"

Note: Remember to add ‘network_mode: 'host'’ to your ‘docker-compose.yml’, if you're using Linux as the Docker host OS.

3. Create a ‘.env’ file with the necessary information to connect Cube with your local Druid instance:

CUBEJS_DB_TYPE=druid
CUBEJS_DB_URL=http://localhost:8888
CUBEJS_DB_USER=druid_system
CUBEJS_DB_PASS=password2

We can run the cube image without defining the ‘.env’ file, but then we will have to pass all configuration information every time on the web page.

4. In your terminal start the Cube container by typing:

docker compose up

Note: If you don't have Docker installed, follow this tutorial. You might have to start the container with root privileges.

5. Go to http://localhost:4000/ in your browser.

6. Select the Data Model section to create a model from the data source.

Data model section

7. Generate a Data Model in YAML or JavaScript format. In this tutorial, we generate all models as YAML files.

Generated Data Model files

8. Build your first chart in Cube.js in the Playground section.

Playground section

9. Select the Measures and Dimensions that you are interested in.

Measures and dimensions

10. Select the most suitable chart type.

Chart type

11. Run the query and admire your first created chart in Cube 🙂 Well done!

Created chart in Cube.js

Defining Row-Level Security by using the ‘queryRewrite’ Method 

In our scenario, we will read data within only one database by different users. The easiest way to provide row-level security is to implement the ‘queryRewrite’ hook. 

This security hook is very powerful because of the way it works. It is called on each request to check your query before it gets processed. Thanks to that functionality, you can implement any type of needed security checks for your app by rewriting input queries accordingly:

1. In your ‘cube-project folder’, which we created earlier, create a new file named ‘cube.js’ to provide custom changes:

touch cube.js

2. Copy and paste the below code into the created file:

module.exports = {
  queryRewrite: (query, { securityContext }) => {
      if (!securityContext.role) {
        throw new Error('No role found in Security Context!')
      }
      else {
        query.filters.push({
          member: 'wikipedia.userid',
          operator: 'equals',
          values: [securityContext.role],
        });
      return query;
    }
  }
};

But what exactly does the code above do? 

We assume that our security context will have a field called ‘role’ to distinguish user permissions for specific rows.

The available unique values from the column called ‘userID’ from the ‘Wikipedia’ data source will serve as user roles. Based on them, we will divide individual rows for users.

3. Add ‘contextToAppId’ method.

Using that method we create a new app ID for each role which prevents users from seeing each other’s data. An app ID is used like a caching key for plenty of in-memory structures such us data model compilation results, etc.

You can copy the below code and paste it before the ‘queryRewrite’ method in the ‘cube.js’ file:

contextToAppId: ({ securityContext }) => {
    `CUBE_APP_${securityContext.role}`;
  }

4. Restart your Cube container to see the provided changes.

5. Check the result at the ‘http://localhost:4000/’ address.

When you try to run the query, you will receive an error called “No role found in Security Context!”.

Don’t worry, that means our code works correctly 🙂. We just tested an issue when the role field was not provided in a security context.

We don’t have any authentication mechanism provided and that’s why we have to extend the ‘security context’ manually. In development mode, you can easily change the ‘security context’ to check the functionality of your code. If you want to read more information about it, check the Cube.js documentation.

6. Add a custom ‘security context’ by clicking the Edit Security Context button located on the top right side of the window.

Security context

7. Run an example query and see the result.

Example query

8. Change the ‘role’ value in the ‘security context’ into “2” and run the above query one more time.

Example query 2

9. Pass another number into the ‘role’ field to check that the user with that ID doesn't have access to any rows.

Example query 3

Congratulations! You implemented basic row-level security access based on user ID/roles.

SQL API

The SQL API enables Cube to deliver data over the Postgres-compatible protocol to different data applications such as Superset, Tableau, Jupyter, Census, Retool, and many more. The full tools list is shown here.

To set dynamic username and password verification for the Cube, we have to define the ‘checkSqlAuth()’ function in the ‘cube.js’ configuration file.

In our example case, we’ve hardcoded the authentication part just to show the mechanism. If you want to authenticate requests to the SQL API with LDAP, please follow this tutorial on the Cube website.

1. Add the below code to your ‘cube.js’ file:

checkSqlAuth: async (req, user_name, password) => {
      if (user_name === 'user1') {
        if (password && password !== 'user1') {
          throw new Error('Access denied');
        }
        return {
          password,
          securityContext: {
            role: '1'
          },
        };
      }

      else if (user_name === 'user2') {
        if (password && password !== 'user2') {
          throw new Error('Access denied');
        }
        return {
          password,
          securityContext: {
            role: '2'
          },
        };
      }
    
      throw new Error('Access denied');
    }
  };

2. Restart the container.

We will try to query Cube with PostgreSQL from the terminal. Make sure you have the ‘postgres-client’ installed on your machine. Use the below code for that purpose:

psql -h localhost \
  -p 15432 \     
  -U user1 test 

3. Provide the password “user1”.

4. Try to execute the sample queries:

1) List of relations
\d 

2) Number of all rows
select count(*) from wikipedia;

3) Data group by "channel" column
SELECT "wikipedia".channel "wikipedia__channel", count(*) "wikipedia__count" FROM "wikipedia" GROUP BY 1 ORDER BY 2 DESC LIMIT 3;

4) Data group by "channel" and "userID" column to show row-level security
SELECT "wikipedia".channel "wikipedia__channel", "wikipedia".userid "user" , count(*) "wikipedia__count" FROM "wikipedia" GROUP BY 1, 2 ORDER BY 2 DESC LIMIT 3;
Executing sample queries

The last query's result clearly shows that row-level security is configured correctly. User1 can see only rows with the ‘userID’ column equal to "1". 

You can now log in as User2 and see the differences between the query results.

Conclusion

Apache Druid doesn’t offer setting up row-level security out of the box. However, it is possible to provide this functionality using tools like Cube.js. While it may not be a perfect solution for Druid users, it allows the creation of a multi-tenant architecture while ensuring data security for clients.

Subscribe and stay in the loop with the latest on Druid, Flink, and more!

Thank you for joining our newsletter!
Oops! Something went wrong while submitting the form.
Deep.BI needs the contact information you provide to contact you. You may unsubscribe at any time. For information on how to unsubscribe and more, please review our Privacy Policy.

You Might Also Like