We use cookies to ensure that we give you the best experience on our website. By continuing to use the website you agree for the use of cookies for better website performance and a personalized experience.

Connecting Druid with AWS EMR via VPN to run Hadoop Indexing Jobs

Jan Kogut
.
June 18, 2017
Connecting Druid with AWS EMR via VPN to run Hadoop Indexing Jobs
Jan Kogut
June 18, 2017
.
X MIN Read
June 18, 2017
.
X MIN Read
June 18, 2017
.
X MIN Read

It's a common case that you would need run a hybrid infrastructure: your own data center with some services in a public cloud. At Deep.BI we have built our private cloud on rented servers and we also use some external clouds like AWS or Azure.

In this post we will describe how to connect a Druid cluster hosted in your private datacenter with Amazon cloud Hadoop called EMR ( Elastic Map Reduce) to run Hadoop Indexing Jobs, which solves the Kafka Indexing Service "not merging segments" problem.

First, we modified security groups and accepted our Druid Middlemanagers. There was no problem with HDFS access as our HDFS client (snakebite) connects to a webHDFS service which listens on port:8020. Unfortunately while trying to access EMR with a public DNS, we encounter the same java.net.ConnectException: Connection refused error .

The message for us was clear: we need to have direct access to the EMR cluster using local EMR cluster hostnames. We configured the ec2-to-emr router and used a VPN to access EMR.

Finally, our middle managers were able to connect to the EMR cluster using its local IP. Unfortunately, we still encountered the same java.net.ConnectException: Connection refused error. The point was that the Hadoop client was selecting our interfaces randomly from [all_traffic:eth0, VPN:tun0] to communicate with the cluster. We tried to "convince" it to use tun0 interface and it was a partial success: no Connection refused error anymore, but a new error was presented: unknown tun0 interface. EMR passed our Hadoop option to its own cluster, which didn't have, nor was suppose to have, any idea about our tun0 interface.

Fortunately, adding proper entries for EMR local name resolution to /etc/hosts on our Druid middlemanagers solved all of the networking problems.        

Learn more about our Druid expertise and how we can help

Subscribe and stay in the loop with the latest on Druid, Flink, and more!

Thank you for joining our newsletter!
Oops! Something went wrong while submitting the form.
Deep.BI needs the contact information you provide to contact you. You may unsubscribe at any time. For information on how to unsubscribe and more, please review our Privacy Policy.

You Might Also Like

No items found.