It’s already two month since I last wrote a summary on recent developments in KDE Itinerary, so here is what happened in October and November. With the 18.12 application release coming up shortly, that’s also largely what you can expect in there.
The probably biggest visible change is the introduction of automatic trip grouping. That is, multiple elements (flights, train reservations, hotel bookings, etc) belonging to the same trip are grouped together in the timeline and can be collapsed for a better overview.
This might sound simple (and for a human it is), but finding a reliable way to automatically group things, and to automatically name the resulting trip, is actually quite tricky. Some of the challenges include:
- Trips that don’t return to the starting point. It’s not uncommon for larger urban areas to have multiple airports that all are viable “home airports”. E.g. TXL -> CDG -> SXF should be detected as a trip group, despite technically not being a loop. A distance threshold addresses this to some extend, but would need to be fairly large to also work e.g. for all of London’s airports.
- Distinguishing between changeover stops and the actual destinations. This matters for picking good default names for trips. We use hotel bookings as an indication for this, the length of a stay between two location changes might also be interesting to look at. Assuming a trip is symmetric however isn’t working.
- Incomplete data. This can be just not yet imported or booked elements of a trip, or elements that simply don’t exist. The Randa demo data set shows this for example with the unidirectional bus trip from Randa to Zermatt, which is followed by a (missing) hike back. A strict connectivity search would detect that gap and wrongly interrupt the trip there. By also looking for the matching reservation numbers of the enclosing flights we can still make this work.
- Missing city names. This is again mostly a problem for naming, as cities are often the best level of detail for describing the destination. That is “Munich” makes sense while “Franz Joseph Strauss” (fallback to airport name) or “Germany” (fallback to country name) are sub-optimal, the latter particularly if we started in that country.
As you can imagine this needs a lot of fine tuning to produce useful results, and possibly additional search and naming strategies for cases not yet considered. So this is a good place to help, if you have particularly complex trips or other cases where the current approach could produce better results, or just ideas on how to improve this, please let us know.
That’s not all that is new though, Nicolas started to work on adjusting the screen brightness when getting ticket barcodes scanned on your phone. So far we have Linux/Solid support for Plasma Mobile, Android support is still to come. This makes it less likely that you are the unpopular person to stall the boarding queue ;-)
- Post-processing of extracted data can now employ libphonenumber. Given a partial address e.g. of a hotel or a restaurant, this allows us to determine the country from an international phone number, or the other way around make a local phone number internationally dial-able when we know the country. There is more that libphonenumber can do, such as determining the city or timezone from a given number, that still has to be integrated though.
- Custom extractors can now also specify filter expressions on proprietary barcode content. This allows us for example to select the SNCF ticket extractor without having the email context for such a PDF document. So far this was only possible on standard barcodes like IATA BCBP or UIC 918.3.
- A new convenience method for extractor scripts allows to turn the often found Google Maps URLs into JSON-LD geo coordinate objects.
- We switched to a newer variant of the ZXing barcode decoding library. This fixes a few cases of failed detection of PDF417 codes with small module sizes (common in boarding pass PDF files), and as a very welcome side-effect increases performance and fixes a memory leak on failed barcode detections.
- The airport identification code now considers a few more alternative transliterations of non-ASCII characters. This allows us to properly identify more airports from their (translated) human-readable names and look up information from the Wikidata knowledge database about them.
- Nicolas implemented generic airline name extraction from Apple Wallet boarding passes.
- Weather forecast in KDE Itinerary now also works for negative coordinates.
schema.orgbased value classes got a shared null state optimization, and avoid detaching their internal state when their property setters are called with unchanged values (see separate post on this). This saves a significant amount of memory allocations during post-processing of extracted data.
- We reduced the need for expensive image scaling operations in barcode extraction from PDF files and improved the heuristics on which graphics found in a PDF could even be a valid barcode, which combined with the better ZXing variant mentioned above cuts the runtime of the extractor test suite by almost half. PDF extraction performance is important as the KMail plug-in applies this much more aggressively then in the previous release.
- Importing and pasting content into KDE Itinerary got streamlined by supporting remote URLs too, and by auto-detecting content. So there’s only a single workflow for all supported content types now.
- The timeline model got a vastly improved test system, which enabled us to fix a number of merge fails when importing data, resulting in duplicated or otherwise wrong elements in the timeline.
Again a big thanks to everyone who donated test data, this helps immensely, please keep it coming!
If you want to help in other ways too, see our Phabricator workboard
for what’s on the todo list, for coordinating work and for collecting ideas. For questions and suggestions, please feel free
to join us on the KDE PIM mailing list or in the
#kontact channel on Freenode.