Five Lessons We Learned on Our Way to Centralized Authentication
In many startups, centralized authentication is a “future us” problem. Setting up centralized auth is useful for managing your network, but requires time, domain knowledge, and patience to get many of the technical solutions working. Compare this with the ease of user management via configuration management (CM) tools that your DevOps teams are already using - they work well enough (and, did we mention, are already in place?) - so it makes total sense that many organizations “punt” on this issue.
However, once your organization grows to a certain size, managing users through CM can be a hassle. For one thing, not all systems are going to rely on UNIX authentication (such as Jenkins, Grafana, etc.), so you’ll need to start configuring those separately and possibly outside of your CM platform. As you add monitoring and oversight into your network, this can get confusing quickly - and keeping passwords in sync would be difficult at best. On the security side, you may forget to disable an account after an employee has left. Or, onboarding can take longer than it should as you add accounts to a variety of systems.
Read on to find out what we learned deploying LDAP at Threat Stack - complete with a few bits of open source code!
Lesson #1: Go with what you know (if it makes sense)
One of the de facto standards for storing and querying directory data is LDAP, the Lightweight Directory Access Protocol - and conveniently, it can store passwords and handle authentication as well. Other protocols, such as Kerberos, may be better for this purpose, but deploying Kerberos is a larger and more complicated project. We felt that LDAP would allow us to achieve some quicker wins as long as we ensured a secure deployment with enforced TLS. We also have pretty simple requirements for identity management, and don’t need to integrate with anything that uses a proprietary authentication system, which simplifies our needs.
Having worked with 389DS and OpenLDAP in the past, we ended up selecting OpenLDAP. OpenLDAP seemed like it would be easier to automate deployment and maintenance tasks. We prefer to rely on open source tooling because, generally speaking, that means we can support, debug, and most importantly, automate it easier than we would a proprietary solution. The #openldap channel on the Freenode IRC network also has people who are knowledgeable and respond to questions generally - as long as you have demonstrated that you’ve gone through the documentation. This can be frustrating at times, since the OpenLDAP documentation can be lacking and inconsistent. For those looking to begin with a test deployment, the Ubuntu OpenLDAP Server Guide is a helpful place to begin your journey.
Lesson #2: Build your own OpenLDAP package
We initially started using the version of OpenLDAP that ships with Ubuntu. This allowed us to get going quickly, but presented issues. First, Ubuntu and Debian packages automatically configure the server and start it after installation. This makes managing its installation with Chef a little more difficult than it needs to be. More importantly though: the distros can lag behind in updates. Trying to get help from the OpenLDAP community while running a distribution-provided version of OpenLDAP is difficult at best.
This became a factor for us when we were trying to debug an interaction between our Chef Custom Resource for managing replication and OpenLDAP. Creating our own version of the OpenLDAP server that installed into /opt allows us to run the latest version of the server, while leaving the Ubuntu userspace alone - and capture useful debugging information to boot.
We created our own package by compiling OpenLDAP and using FPM, then handled post-install configuration with Chef. It’s helpful to look at how Ubuntu handles its post-install configuration by downloading the package source by running apt-get source slapd. Our Chef cookbook writes out the initial configuration to disk, then runs the appropriate slapadd command, then sets a node attribute saying the configuration has been done. This ensures the action is idempotent.
Lesson #3: Replication is a hard thing to get right the first time
The Ubuntu guide we referenced above is decent for getting your first deployment up and running. It has a few downsides, however, one of which includes the fact that writes to the directory must go to the producer server. This presents a problem: do you have a way of tagging which server in your infrastructure does writes and reads? Clever application of Chef roles could accomplish this, but this is not the future we were promised in 2016. Instead, we set up Multi-Master Replication (MMR) so we can send writes to either server and not have to add complexity to our setup.
Setting up MMR can be complicated, and the documentation and guides available online vary in quality. At a high level, you’ll need to enable the accesslog and syncprov overlay. The syncprov overlay is what actually performs replication - and it uses the accesslog as a signal to know when to perform replication and what to replicate. Then you’ll need to configure your olcSyncrepl entries on the LDAP database you want to replicate so that OpenLDAP knows where to send changes.
It’s best to look towards examples that work in production, and we found one in the Zimbra project’s zmldapenable-mmr script, specifically lines 145–191. This gives an example of attributes required to set up a new database, configure the new database to store accesslog information, and enable the target database (the LDAP root that your records are stored in) for replication. A key part of this is the setup of olcSyncrepl attributes on line 205. This is how your server knows what peers to replicate with.
One place we chose to deviate from Zimbra’s configuration was using certificates rather than passwords for replication. We wanted to use certificates to avoid having a password with replication permissions in plaintext on our host. To do this, you’ll need to generate a client certificate signed by a CA you trust - a good use for Vault or CFSSL.
Lesson #4: Managing LDAP with CM tooling is complicated
One thing that makes handling the configuration of OpenLDAP complicated is the fact that it stores its configuration in its own database (cue the “inception” noise here). This presents a problem: Modern CM tooling has lots of tooling around for writing and reading files and restarting or signaling services as necessary. This is a bit harder with OpenLDAP and Chef.
To make this easier for you, we’re open sourcing the providers (and helpers) we wrote to interface with LDAP. We use the Syncrepl resource we developed and Chef’s search functionality to figure out what olcSyncrepl entries each host needs to have. For every Chef run, we also get the list of existing ones and compare it to our Chef search results so we can remove ones that no longer need to be there. We also have a basic LDAPModify resource that functions like the execute provider to set attributes required after olcSyncrepl entries exist (i.e., olcMirrorMode). You’ll find some of this logic in our helper library.
When we originally deployed OpenLDAP with Chef, we just cleared all the olcSyncrepl entries and added new ones every Chef run. What we found was an interesting condition: sometimes the Chef run would hang when adding the olcSyncrepl entry. This only happened when Chef would add an entry for replicating with the local host - not what we needed. When we cleared that up, another bug appeared: clearing and readding syncrepl entries for hosts that OpenLDAP was still replicating with would result in having too many open file handles. The Syncrepl resource we open-sourced above handles everything nicely so you avoid these issues.
Lesson #5: Deploy carefully
Many of our internal services are running in AWS and take advantage of auto-scaling groups. This means we can test a deployment, and if it doesn’t work out, we can quickly replace the host into a known working state. That said, we still ran into issues along the way, and not all of our services are stateless. Screwing up authentication on a Cassandra host could mean many GB of data to resync, so it’s important to try deploying to a few hosts and slowly grow from there. Most of our issues revolved around allowing all of our hosts access to the LDAP security group. We also needed to make sure we configured our pam_access setup to allow for local service users (i.e., cassandra, elasticsearch, etc.). The existence of our (secure) emergency account proved to be useful for us as well.
Is centralized authentication for you?
Well, we can’t answer that specific question - though from a cloud infrastructure security standpoint, we do see a lot of benefit. More information on the code we released (including how to use it) can be found on GitHub. If you decide to deploy OpenLDAP, hopefully we’ll have saved you a few hours of frustration!
This was originally posted on the Threat Stack blog.
2016-10-25 16:00 +0000