Automatically finding and fixing insecure HTTP links
End of last month I attended the KDE privacy goal sprint in Leipzig. Together with Sandro I continued to look into tooling for identifying and fixing insecure HTTP links, an issue I have written about earlier already. The result of this can be found in D19996.
Identifying insecure links
The first tool we built is
httpcheck, a scanner for
http: URLs in whatever files you
point it to. This is optimized for high speed and therefore doesn’t do any online validation etc.
Obviously something like this can never be perfect, so this has a few features to deal with the common problems we encounter:
- There is a global exclusion list for known URIs as e.g. used in XML namespaces, where the
http:is part of the identifier and not resolved as a network address (see also the last post on this issue).
- There is an exclusion list for services known to not support transport encryption (yes, those still exist in 2019), as well as for URLs that would just produce an unmanageable amount of warning noise for now (that’s mainly the gnu.org addresses commonly found in license headers).
- Like other code checkers, this supports inline and per module overrides to suppress warnings. It is for
example quite important to not touch code that deals with adjusting
https:URLs and therefore might validly contain parts of what seems to be an insecure URL.
A tool like this is mainly useful to prevent new issues from being introduced, and there’s two ideas on how to deploy this:
- As a unit test injected by ECM into all projects (as it’s currently done for the appstream test, and that’s also why the code for this is in ECM).
- As a commit hook, similar to the license checks run at commit time.
Before rolling this out we need to fix the current code base first though, to not drown in warnings and test failures.
Automatically fixing insecure links
And that brings us to the second tool,
httpupdate, which is is meant to automate the migration to
This will consume the same overrides and exclusion list as
httpcheck, so it wont touch anything explicitly marked
as intentionally using
http:. It also doesn’t simply replace
https: but it first validates that
the corresponding service actually supports secure connections.
A side-effect of this is that it also identifies dead links or no longer existing services, and therefore helps to maintain e.g. our documentation.
Of course this is also imperfect, the result always needs manual review, but it nevertheless massively speeds up the process compared to doing all this manually.
How much does this help with the overall privacy for our users though? How often do you click on links in the documentation, CMake output, or let alone in license headers? And even then, doesn’t HSTS enabled browsers and properly configured web servers redirect to secure connections anyway? In most cases this is probably true, and the practical impact is limited.
However during the test runs of the tools at the sprint we found two possible data leaks this way (one when using an URL shortening service, one for a pastebin service), among hundreds of probably less impactfull insecure links. So I think this is worth it even if it just helps us to spot a potential high impact issue among the many harmless ones.
As mentioned above, before it makes sense to roll out the continuous checks for this we need to fix the current state. That means going through all repositories and see what these tools find, fix things and improve the tools and their exclusion lists on the way. So there’s plenty of opportunity to help :)