One obvious aspect of KDE’s privacy goal is eliminating all network connections that are not using transport encryption. That’s however not as straightforward to ensure as it may sound, it’s easy to have a long forgotten HTTP link in a rarely used dialog that should have been changed to HTTPS many years ago already. How do we find all these cases?
First of all, this is not about intentional or malicious attempts to bypass or weaken transport encryption at any level, but about finding our own mistakes. That is, simple typos forgetting the crucial ‘s’ after ‘http’, or legacy code from a time before the corresponding service even supported transport encryption.
This is also not about identifying servers that only offer weak or otherwise broken transport security, and communicating that to the user as we are used to from web browsers. All of that needs to be looked at too of course, but that’s a different task.
Naively, searching the source code for
http: and replacing it with
https: would be a good start. However,
it’s a bit more complicated than that, mainly due to http URIs being used both as resource identifiers and as
addresses for resources. A common case for the former are XML namespaces, those are not actually retrieved via
the network, so http URIs are actually not a problem (on the contrary, changing them might confuse the code
dealing with them). In the latter case the URIs are actually URLs and are used for network operations, those
we need to fix.
Still, a bit of crude grep work can turn up quite some useful results already. This can be clickable links in code (D16904, R209:33447a) or documentation (D17262), downloaded content (D7414, D17223) or access to online services (D7408, D16925, D16946). But many hits are part of code documentation, automatic tests or license information (R1007:aff001), which are less severe in their impact. Sometimes URLs also appear in translations, so those would need to be checked too.
Another place to look for
http: strings is in the compiled binaries. That’s less complete, but seems to
have a much smaller false positive rate. A simple approach is grep-ing through the output of
strings -l e
(the latter decodes 16bit little endian Unicode strings as used by QStringLiteral), and filtering out common URIs
as needed in the source code search too.
An entirely different approach for identifying clear text connections is observing network activity at runtime. Tools like tcpconnect of the bcc suite seems to be a good starting point, as it allows continuous monitoring without noticeable impact, unlike e.g. capturing the entire network communication with Wireshark.
This is a perfect topic to get started with I think, fixing
http: links is as easy as it gets, and yet that has relevant impact on
the privacy or our users. But it doesn’t stop there, as we also need to build the tools to identify
these issues more reliably. There isn’t much yet in terms of tooling, so a simple script would already be an improvement
(e.g. to automatically check the KIO search provider desktop files), but if you are into elaborate runtime tracing techniques
like used by the bcc tools, here’s a real-world problem to use this for :)