News: 0000814389

  ARM Give a man a fire and he's warm for a day, but set fire to him and he's warm for the rest of his life (Terry Pratchett, Jingo)

The Let's Encrypt certificate revocation scare

([Security] Mar 10, 2020 17:20 UTC (Tue) (jake))


By Jake Edge

March 10, 2020

The [1]Let's Encrypt project has made real strides in helping to ensure that every web site can use the encrypted HTTPS protocol; it has provided TLS certificates at no charge that are accepted by most or all web browsers. Free certificates accepted by the browsers are something that was difficult to find prior to the advent of the project in 2014; as of the end of February, the project has [2]issued over a billion certificates . But a bug that was recently found in the handling of [3]Certificate Authority Authorization (CAA) by the project put roughly 2.6% of the active certificates—roughly three million—at risk of immediate revocation. As might be expected, that caused a bit of panic in some quarters, but it turned out that the worst outcome was largely averted.

Let's Encrypt allows web-site operators to sign up for its service to sign their TLS certificates, so that browsers will recognize the certificate as valid. Let's Encrypt acts as a Certificate Authority (CA) and its keys are signed by a CA (IdenTrust) that is carried in the root certificate store for the browsers. That means a browser can follow the signature chain from a root certificate it trusts all the way to the certificate of the site, thus establishing the validity of the keys contained in the certificate.

In order for a site to get a certificate from Let's Encrypt, its administrator needs to show that they control the domain in question. That's typically done by adding a challenge value provided by Let's Encrypt to either the DNS information for the domain or via a URL that can be retrieved from the domain's web server. The administrator proves that they have the needed access, thus show that the domain is under their control.

Administrators who wish to restrict the kinds of certificates that can be issued for their domains can add CAA records to their DNS configuration. Those can be used to disallow certain providers, such as Let's Encrypt, from issuing certificates for a domain or portion of one. For example, the web site administrator at "subdomain.example.com" could not receive a certificate from Let's Encrypt or some other CA simply by adding a web page to the server they control if the administrator of the top-level "example.com" domain disallowed that with CAA records. Some sites may also want to restrict the CAs that can be used; some CAs offer services beyond just signing, which may be required for security or regulatory compliance.

So when Let's Encrypt is checking a site's validity, it needs to consult the CAA records as well, which turns out to be where the bug was. Let's Encrypt allows users to wait up to 30 days after proving they control the domain before requesting a certificate. But the CAA information needs to be checked within eight hours of issuance, so a recheck is done if needed. As [4]reported by Josh Aas, the executive director of the [5]Internet Security Research Group (the entity behind Let's Encrypt), the [6]Boulder CA server had a problem in the recheck code:

The bug: when a certificate request contained N domain names that needed CAA rechecking, Boulder would pick one domain name and check it N times. What this means in practice is that if a subscriber validated a domain name at time X, and the CAA records for that domain at time X allowed Let’s Encrypt issuance, that subscriber would be able to issue a certificate containing that domain name until X+30 days, even if someone later installed CAA records on that domain name that prohibit issuance by Let’s Encrypt.

Before the bug was fixed, certificates issued by Let's Encrypt based on a certificate request with multiple domains in it may not have had their domain's CAA records checked properly. Those affected certificates were thus not in compliance.

That led to a [7]message on March 3 from a Let's Encrypt staff member saying that any of the affected certificates that had not been renewed by March 5 would be revoked. That would mean browsers would stop accepting the certificates from those three million sites. But by March 4, Aas [8]said that 1.7 million certificates had been renewed, which meant the existing, possibly invalid, certificates for those sites could be revoked without causing any problems. Of the remaining certificates, only 445 were for sites where the CAA record would disallow certificates being issued by Let's Encrypt; those were forcibly revoked, but the rest would not be revoked, at least immediately.

Let's Encrypt certificates are only issued for 90 days and must be renewed before the end of that time period. In the worst case, it means that around 1.3 million sites would have invalid certificates, at least in a technical sense, for up to three months. The [9]CA/Browser Forum (CA/B), which sets the standards that CAs need to comply with, does not consider certificates to be valid if the CAA records were not checked within eight hours before issuance. So even though none of those sites currently have a CAA record prohibiting the issuance of Let's Encrypt certificates, the existing set are not valid under the rules. The timeline set by CA/B for revocations is what drove the original March 5 deadline.

A Mozilla [10]bug report was filed by Aas to request an exemption from the requirement to revoke all of the affected certificates. Wayne Thayer [11]pointed Aas at the Mozilla [12]guidelines on revocation , which notes that the company does not grant exceptions but recognizes that there may be times when " revoking misissued certificates within the prescribed deadline may cause significant harm ". He also said that Mozilla requests some more information if a CA decides not to revoke the certificates.

Jacob Hoffman-Andrews [13]replied with additional details to explain why Let's Encrypt felt that it would be detrimental to do the bulk revocation. He said that users who encountered an error when browsing to an affected site would likely " look up instructions on how to bypass revocation checks "; once doing so they might well forget to re-enable those checks, so they would miss other revocations. It could also trigger "warning blindness", where users see so many warnings that they stop paying attention to them. But he noted a larger problem, as well:

By reviewing previous incident reports and analyzing our current situation, a common root cause of failure to timely revoke is that Subscribers are not able to replace certificates on the BR-

[14]baseline requirements

mandated timelines (24 hours and 5 days, depending on the issue).

Most Subscribers are not able to field round-the-clock incident response, so improving the speed of manual replacement processes cannot be the answer. Increasing public acceptance of revoked certificate errors also cannot be the answer, because that would undermine public faith in the web PKI. Reducing the incidence and scope of CA errors is an important part of the solution, and we have laid out some plans to that effect at [15]https://bugzilla.mozilla.org/show_bug.cgi?id=1619047 . However, responsible systems design requires layered responses, and it is possible that we, or another CA, will have a similar-sized incident in the future despite our best practices and best efforts.

He said that Let's Encrypt plans to work on an open protocol to notify users of automated CAs of an imminent revocation in such a way that those certificates can be automatically renewed. In a world where even the smallest web sites have TLS certificates so that they can offer encrypted communications to their users, it is certainly important for them to be able to maintain their certificates—even without staff dedicated to handling such things. Those who are wondering can consult a [16]site where users of Let's Encrypt certificates can check whether they need an update.

The browser makers have the final authority on what root certificates they will accept, but they need to be cognizant of the impact removing one would have. If one or more of the big players decides that the steps taken by Let's Encrypt were not sufficient, they could remove the IdenTrust root certificate from their root store, though that would affect far more than just Let's Encrypt certificates. In that unlikely scenario, IdenTrust might decide (or be pressured) to revoke the Let's Encrypt certificates instead. No actions of that sort have been mooted—at least publicly. The havoc caused by such a move would be monumental.

One possible downside of the widespread availability of gratis certificates from Let's Encrypt is the creation of a monoculture. Concentrating TLS certificate issuance in a single organization might be worrisome, whether it is Let's Encrypt or one of the commercial providers. We are far from that situation now, but this incident does show that a problem found in a large number of issued certificates may leave any CA in an unenviable position—certificates that do not expire for a year or more would only add to the mess.

Overall, Let's Encrypt did an excellent job in a rather compressed time frame to identify, fix, and partly mitigate what was, in truth, just a technical violation of the specifications for CAs. It seems rather unlikely that many—perhaps any—of the remaining unrevoked certificates were actually issued for domains that they should not have been. That is not to say that technicalities should be ignored, but it is clear that sometimes there are overarching considerations as well. The bug and the problems it caused are unfortunate, for sure, but things seem to be moving in the right direction at this point.



[1] https://letsencrypt.org/

[2] https://letsencrypt.org/2020/02/27/one-billion-certs.html

[3] https://letsencrypt.org/docs/caa/

[4] https://community.letsencrypt.org/t/2020-02-29-caa-rechecking-bug/114591

[5] https://www.abetterinternet.org/

[6] https://github.com/letsencrypt/boulder

[7] https://community.letsencrypt.org/t/revoking-certain-certificates-on-march-4/114864

[8] https://community.letsencrypt.org/t/2020-02-29-caa-rechecking-bug/114591/3

[9] https://cabforum.org/

[10] https://bugzilla.mozilla.org/show_bug.cgi?id=1619179

[11] https://bugzilla.mozilla.org/show_bug.cgi?id=1619179#c1

[12] https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation

[13] https://bugzilla.mozilla.org/show_bug.cgi?id=1619179#c7

[14] https://cabforum.org/baseline-requirements-documents/

[15] https://bugzilla.mozilla.org/show_bug.cgi?id=1619047

[16] https://checkhost.unboundtest.com/

The Let's Encrypt certificate revocation scare

While I found it annoying initially, I must applaud Let's Encrypt's original decision to limit certificate validity to just 90 days. That has practically forced everyone to automate the renewal process. And it obviously mitigates problems like this as well, and even more so after automated renewal on revocation.

The Let's Encrypt certificate revocation scare

While I found it annoying initially, I must applaud Let's Encrypt's original decision to limit certificate validity to just 90 days. That has practically forced everyone to automate the renewal process. And it obviously mitigates problems like this as well, and even more so after automated renewal on revocation.

The Let's Encrypt certificate revocation scare

Why doesn't certbot periodically check that its current certificate is revoked (or, even better, on an about to be revoked list) and try to get a new one if it is? Then when things like this happen Let's Encrypt doesn't need to rely on millions of system admins to be proactive and manually renew the certificate for them. They can simply add all the offending certificates to a list and have the tools handle the problem for them.

The Let's Encrypt certificate revocation scare

Like [1]https://github.com/certbot/certbot/pull/7829 you mean ;-)

I believe the idea of a "about to be revoked" status is also being discussed but that obviously needs changes to ACME to provide a way to report that status to clients.

[1] https://github.com/certbot/certbot/pull/7829

The Let's Encrypt certificate revocation scare

Like [1]https://github.com/certbot/certbot/pull/7829 you mean ;-)

I believe the idea of a "about to be revoked" status is also being discussed but that obviously needs changes to ACME to provide a way to report that status to clients.

[1] https://github.com/certbot/certbot/pull/7829

The Let's Encrypt certificate revocation scare

This change is part of certbot 1.3 which was released 7 days ago so maybe it's already working on many production machines.

The new version is available via Fedora's updates-testing repos already. I hope CentOS/RHEL (via Fedora EPEL) will follow soon-ish.

The Let's Encrypt certificate revocation scare

This change is part of certbot 1.3 which was released 7 days ago so maybe it's already working on many production machines.

The new version is available via Fedora's updates-testing repos already. I hope CentOS/RHEL (via Fedora EPEL) will follow soon-ish.

The Let's Encrypt certificate revocation scare

I keep Certbot up to date on my systems by installing it from PyPI; it's simple to do and avoids delays in getting new features like this (or important bug fixes).

The Let's Encrypt certificate revocation scare

I keep Certbot up to date on my systems by installing it from PyPI; it's simple to do and avoids delays in getting new features like this (or important bug fixes).

The Let's Encrypt certificate revocation scare

Even early on a typical distro install puts a cron job in to check nightly the status of the certificate.

The Let's Encrypt certificate revocation scare

That typically only checks for the expiration date of the certificate itself, which won't change if it's revoked.

The Let's Encrypt certificate revocation scare

That typically only checks for the expiration date of the certificate itself, which won't change if it's revoked.

The Let's Encrypt certificate revocation scare

Unless I'm mistaken when certbot checks the expiration it also reviews the revocation list that updates when certbot connects. I did this just the other day on a new server and I remember the step where it downloads and checks the revoke list.

That was one of their smartest decisions with Let's Encrypt, right behind the 90 day certs, in that revoke lists were fully integrated into the process.

The Let's Encrypt certificate revocation scare

Unless I'm mistaken when certbot checks the expiration it also reviews the revocation list that updates when certbot connects. I did this just the other day on a new server and I remember the step where it downloads and checks the revoke list.

That was one of their smartest decisions with Let's Encrypt, right behind the 90 day certs, in that revoke lists were fully integrated into the process.

The Let's Encrypt certificate revocation scare

This is a new feature, merged and released last week ( [1]https://github.com/certbot/certbot/pull/7829 ).

[1] https://github.com/certbot/certbot/pull/7829

The Let's Encrypt certificate revocation scare

This is a new feature, merged and released last week ( [1]https://github.com/certbot/certbot/pull/7829 ).

[1] https://github.com/certbot/certbot/pull/7829

The Let's Encrypt certificate revocation scare

> That is not to say that technicalities should be ignored, but it is clear that sometimes there are overarching considerations as well

Yes. The rules are there to stop accidents (or worse) happening. The right response when something bad happens anyway, as it will, is to mitigate the damage and decide whether and how to beef up the rules and procedures. Somebody didn't do their preflight checks, that means the plane shouldn't be up there but does not mean we shoot the plane out of the sky.

The Let's Encrypt certificate revocation scare

My initial impression was that LE will redo CAA checks at current time and revoke only these certificates that don't match current CAA records.

Unfortunately they revoked all these certs regardless of current CAA state. Email notification failed (I guess because emails subscription was disabled due to LE sending other, useless in our case, emails).

When we noticed this then reissuing required certificates took double digit number of hours due to very low limits (only 300 orders per 3h and the queue was already full of other pending orders; yes we have tons of certs on single account).

Not happy. It wasn't perfectly handled by LE and by us here :/

The Let's Encrypt certificate revocation scare

> My initial impression was that LE will redo CAA checks at current time and revoke only these certificates that don't match current CAA records.

The CA/B requirements specifically prohibit this behavior. CAA checks, as the article notes, must be performed within 8 hours of issuance. DNS has no history, so the only way to be sure that a certificate was CAA-compliant at time of issuance is to check at time of issuance.

Realistically, nobody is going around changing their CAA settings every time they want to issue a certificate, so this is unlikely to matter. But CA/B does not allow CAs to say "eh, it's probably fine." If the certificate is non-compliant, it's supposed to be revoked.

I've also heard people suggest that CAA should be checked by browsers instead of or in addition to CAs, but this is basically negative DANE, and negative DANE requires the end user to have real DNS (i.e. they should be able to successfully query arbitrary DNS records without interference by their ISP or anyone else). If they don't, then you either have to hard-fail the connection (which makes the modern web unusable for those users) or else you allow any MitM attacker to disable the check by blocking the lookup (so you might as well not do the check at all).

Unfortunately, many users do not have real DNS, because many ISPs and governments "know" that you only "have to" resolve A and AAAA records (and perhaps one or two others) for the "internet" to "work" as far as a typical non-technical end user is concerned, so if you look up a TXT record or something more exotic, it may simply fail or return nonsense. This is also one of the motivating factors in DoH deployment - most intermediaries take a "hands off" approach to anything that looks like an HTTPS stream, because you can't do much with them anyway.

> When we noticed this then reissuing required certificates took double digit number of hours due to very low limits (only 300 orders per 3h and the queue was already full of other pending orders; yes we have tons of certs on single account).

I would like to respectfully suggest that, if you really have that many certificates, and really need to process them that quickly, and both of those things are hard "my business is in danger of going broke" requirements, you might consider paying for them, or (if possible) changing your system in such a way that you can relax one or both of those requirements. Let's Encrypt is a free service. It does not come with a Service-Level Agreement, which it sounds like you need.

The Let's Encrypt certificate revocation scare

> My initial impression was that LE will redo CAA checks at current time and revoke only these certificates that don't match current CAA records.

The CA/B requirements specifically prohibit this behavior. CAA checks, as the article notes, must be performed within 8 hours of issuance. DNS has no history, so the only way to be sure that a certificate was CAA-compliant at time of issuance is to check at time of issuance.

Realistically, nobody is going around changing their CAA settings every time they want to issue a certificate, so this is unlikely to matter. But CA/B does not allow CAs to say "eh, it's probably fine." If the certificate is non-compliant, it's supposed to be revoked.

I've also heard people suggest that CAA should be checked by browsers instead of or in addition to CAs, but this is basically negative DANE, and negative DANE requires the end user to have real DNS (i.e. they should be able to successfully query arbitrary DNS records without interference by their ISP or anyone else). If they don't, then you either have to hard-fail the connection (which makes the modern web unusable for those users) or else you allow any MitM attacker to disable the check by blocking the lookup (so you might as well not do the check at all).

Unfortunately, many users do not have real DNS, because many ISPs and governments "know" that you only "have to" resolve A and AAAA records (and perhaps one or two others) for the "internet" to "work" as far as a typical non-technical end user is concerned, so if you look up a TXT record or something more exotic, it may simply fail or return nonsense. This is also one of the motivating factors in DoH deployment - most intermediaries take a "hands off" approach to anything that looks like an HTTPS stream, because you can't do much with them anyway.

> When we noticed this then reissuing required certificates took double digit number of hours due to very low limits (only 300 orders per 3h and the queue was already full of other pending orders; yes we have tons of certs on single account).

I would like to respectfully suggest that, if you really have that many certificates, and really need to process them that quickly, and both of those things are hard "my business is in danger of going broke" requirements, you might consider paying for them, or (if possible) changing your system in such a way that you can relax one or both of those requirements. Let's Encrypt is a free service. It does not come with a Service-Level Agreement, which it sounds like you need.

The Let's Encrypt certificate revocation scare

Ok, CA/B requirements explain why it was done this way.

About limits. We would go multi-account and do not have limit problem at all but [1]https://letsencrypt.org/docs/rate-limits/ recommends "using one account for many customers" which turns out to be a big trap.

[1] https://letsencrypt.org/docs/rate-limits/

The Let's Encrypt certificate revocation scare

Ok, CA/B requirements explain why it was done this way.

About limits. We would go multi-account and do not have limit problem at all but [1]https://letsencrypt.org/docs/rate-limits/ recommends "using one account for many customers" which turns out to be a big trap.

[1] https://letsencrypt.org/docs/rate-limits/

The Let's Encrypt certificate revocation scare

> Email notification failed (I guess because emails subscription was disabled due to LE sending other, useless in our case, emails).

I've been using LE certs for years, and I've never once received an email from them which wasn't timely and appropriate and specifically about my usage of their service. If you're blocking email from LE, that's quite unfortunate.

The Let's Encrypt certificate revocation scare

Emails about expiring certificates are useless here.

Emails about certificates being revoked are very important.

If you don't want first type of emails you also didn't get second type (at least I didn't).

The Let's Encrypt certificate revocation scare

Emails about expiring certificates are useless here.

Emails about certificates being revoked are very important.

If you don't want first type of emails you also didn't get second type (at least I didn't).

The Let's Encrypt certificate revocation scare

That's fair, I suppose, except in my case if I get an email about an expiring certificate it's because my automatic renewal process is failing and I want to know about that.

The only other 'expiring certificate' messages I get which are not useful are those for certificates I am no longer using; I have not investigated to see if there's a way to tell LE that a certificate is no longer in use and should just be immediately expired and forgotten.

The Let's Encrypt certificate revocation scare

That's fair, I suppose, except in my case if I get an email about an expiring certificate it's because my automatic renewal process is failing and I want to know about that.

The only other 'expiring certificate' messages I get which are not useful are those for certificates I am no longer using; I have not investigated to see if there's a way to tell LE that a certificate is no longer in use and should just be immediately expired and forgotten.

The Let's Encrypt certificate revocation scare

My company was able to identify the ~1000 certs of ours that were affected and get them renewed before the (second) deadline. It wasn't easy and I know others (such as Akamai) were having difficulty. I'm glad LE decided to revisit the original decision.

The Let's Encrypt certificate revocation scare

This case should illustrate that centralization is evil regardless of the best intentions of its creation.

Some authority with ability to disrupt two-digit percentage of internet... Is this a distopian future? No, it's a grim PKI reality.

The Let's Encrypt certificate revocation scare

Has anyone considered creating an EU based equivalent to Let's Encrypt?

The Let's Encrypt certificate revocation scare

Has anyone considered creating an EU based equivalent to Let's Encrypt?

The Let's Encrypt certificate revocation scare

BuyPass ( [1]https://www.buypass.com/ssl/products/acme ) is in Norway. There are probably more, Wikipedia lists 8 ACME-compatible cert providers.

[1] https://www.buypass.com/ssl/products/acme

The Let's Encrypt certificate revocation scare

BuyPass ( [1]https://www.buypass.com/ssl/products/acme ) is in Norway. There are probably more, Wikipedia lists 8 ACME-compatible cert providers.

[1] https://www.buypass.com/ssl/products/acme

Many a family tree needs trimming.