A few weeks back I’d mentioned in passing that we’d migrated off Keycloak.
I got a question about why, so I’m putting down some rationale here. I respect open-source projects a lot, so this isn’t meant to throw shade on Keycloak in any way, but I believe in being transparent about these things so people can read alternative arguments and make better-informed tech decisions.
A few reasons why we moved away from Keycloak:
It has an API for integration, but its API reference offers precious little beyond the names of API resources and fields. There was never enough documentation to integrate a new feature, always necessitating trial and error.
Core authentication-related flows were the most likely to regress because writing tests against a remote API requires a lot of stubbing. With heavy stubbing involved, you really only end up writing tests that verify that your stubs are set up the way you expected them to be, and that made it very easy to break things because mistakes would not be caught by CI.
Keycloak pages like login or email verification are customized using a template language that was divergent from our main Frontend stack, and which no one was every particularly excited to write. They’d languish without updates and look awful compared to the rest of the site. This is also why some features were slow to happen, like multi-factor support, which we’ll finally be shipping next week.
Operationally, it was a black box wherein if something went wrong, no one knew how to fix it. We had a very near miss where a Heroku regression was crashing Keycloak on start up with an inscrutable backtrace, and which luckily manifested in our staging environment first, but which we knew was coming for prod on the next 24 hour dyno cycle. None of the people who’d chosen and integrated Keycloak had even the faintest idea how to diagnose the problem or fix it. In the end we were able to debug it by digging deeply in Keycloak source code, but were within hours of our login system being hard down with no fix in sight.
I wasn’t super confident of its security bonafides given the glimpses into its internals I’d seen. For example, it defaults to PBFKD2 for password hashing at 27,500 iterations, far below the 2023 recommendation from OWASP of 600,000 iterations. I admit to having only this one example and maybe everything else is above bar, but who knows.
It’s a huge Java app and not that cheap to run. It needed a performance dyno on Heroku due to high memory requirements, and between that and a database in staging and prod it was $800/month, or about half our total bill.
I’m usually against the NIH mindset, but sometimes it’s right. Looking up best practices for storing passwords is easier than looking up how Keycloak does it, the answer to which isn’t documented and only exists in Discourse forums or by reading source code. Integrating SSO isn’t that hard anymore thanks the widespread standardization around OIDC. Implementing flows for email verifications and password resets is a little more work, but you end up having to do most of it under Keycloak anyway if you want a custom theme.