Enabling EMR Self-Service With ChatOps

The Threat Stack Security Operations Center loves data. And Threat Stack has a lot of data! Our SOC analysts leverage Amazon Elastic MapReduce and Jupyter notebooks to query the raw data that the Threat Stack Cloud Security Platform® receives to develop new detection and analysis mechanisms. Making something so powerful available to our analysts is a cross-team effort. From a security standpoint, the Platform Security team wondered how analysts could safely spin up an EMR cluster in our production environment to perform advanced analysis.

To start: An EMR lifecycle manager is extremely helpful here, and we were fortunate to have one that our architecture team wrote. Our lifecycle manager takes a JSON-based configuration and handles spinning up a cluster with a configuration that can be controlled via a collection of HTTP endpoints. But that API only works for internal machines that want EMR clusters. If we’re adhering to the principle of least privilege (and we are), giving access to that API would allow analysts to affect other clusters that exist to process customer data.

So how can we solve this problem? Surprisingly: ChatOps and AWS Lambda. We wrote a function that consumed events from the Slack Events API, and depending on the use of a certain phrase, it would either start, get the status of, or destroy a cluster used for data exploration.

This method fits our security requirements in a few ways, specifically:

The function can only affect one specific cluster, as opposed to all of them.
The analyst doesn’t have direct access to the customer data.
Accidental destruction of the cluster doesn’t affect the delivery of the Threat Stack service.

Access to the notebooks analysts use is authenticated using SSO, so we’re able to maintain positive control over that as well. Ultimately, it lets the SOC analyze the data they need when they want, without a platform engineer having to submit the request — which is how it is done today.

While many examples for Slack bots are in Node.js and Python, much of our security tooling at Threat Stack is written in Go, so we naturally used that. This presented a small challenge in that existing examples were built around a Go application running on a server full time (leveraging net/http’s http.ListenAndServe). After a bit of experimentation, we got a sample function working, which is shared below.

Setting up Slack Before you get started, you’ll need some authentication information available from Slack. To do this, you’ll visit the Slack API center and list your applications — then, click “Create App” and name your Slack bot. At this point you can modify how the application will show up in your Slack instance. You can also specify how you want Slack to interact with your application. We’ll do that a bit later, but for now, copy the “Signing Secret,” and then go to “OAuth & Permissions” to give your bot the following OAuth scopes:

channels:history
chat:write & groups:history

Feel free to modify these scopes. For example, if you expect your bot to work over direct message versus public or private chats, adjust the permissions accordingly. You’ll still need the chat:write permission since that’s how the bot can communicate what it’s doing. After you update the OAuth scopes, you can click the “Install App to Workspace” button, which will generate a bot user access OAuth token. Copy that as well. Both the signing secret and access token (which generally starts with xoxb) will be useful to you later.

Building the Go integration The whole integration is available on the Threat Stack GitHub account. The code has comments, but we’ll describe it here at a high level, too. Our sample application relies on the AWS Lambda library, which has you pass in a function you want to run. The receiveSlackEvent function performs the work, with a handful of functions to improve readability of that main function. One of those functions makes it easier to set the log level if you’re working in a development environment: setLogLevel. Another function validates the message we receive from Slack: verifyReq. Verifying that the request comes from Slack is very important to ensure that an adversary didn’t craft a malicious request. There are some risks here, but given our threat model, we’re OK with them.

The sample application code leverages the slack-go library, which has a built-in parser for the Slack Events API messages. A variety of messages can come in through the Events API, but the ones we care about are challenge events and callback events. Challenge events are how Slack verifies that the URL you tell Slack to send these events to is ready for Slack API events — so you don’t send a bunch of API events to someone who doesn’t expect them. Callback events contain the events you’re listening for. Slack has a variety of events you can subscribe to, but the ones you care about for this sample application are:

message.channel
message.groups (if your bot will be in private rooms)

Once you receive a callback event, you parse the event that’s sent to you. The InnerEvent data contains the actual event that you’ve subscribed to above. This application specifically listens for a MessageEvent with the text “It’s showtime!” and responds with “Yes, it is!”

Deploying your code Running GOOS=linux GOARCH=amd64 go build && zip slackevent.zip slackevent will get you a zip archive that you can upload to AWS Lambda. We leveraged the beta HTTP API Gateway — which worked well enough for what we wanted to accomplish. Once everything’s uploaded to AWS and you have an API Gateway URL, head on back to the Slack application listing and click on your application. Then go to “Event Notifications” and enable it. Paste in your API Gateway URL, and then under “Subscribe to Bot Events,” add the scopes we listed above.

At this point, you should invite your new Slack bot into a channel and write some messages into the channel. For each message, you should see a Lambda invocation — and if you write “It’s showtime!” you should get a response. If you did: Congratulations! If you didn’t, start retracing step by step. Did the API Gateway get your request? Did the Lambda function invoke? Are there places you could add in fmt.Printf statements to see what went wrong? This was a big part of how we debugged this in our development environment.

This was originally posted on the Threat Stack blog.