In the last post I described how we handle public transport line metadata in KPublicTransport, and what we use that for. Here’s now how you can help to review and improve these information in Wikidata and OpenStreetMap, where it not only benefits KPublicTransport, but everyone.

OpenStreetMap

Let’s consider Berlin’s subway line U1 as an example on how things are ideally represented in OSM and Wikidata.

As a single bi-directional line, there are three relevant elements in OSM for this:

We have some heuristics to merge route elements without a route_master as well, but things get a lot more reliable if this is set up properly and all route elements are members of the corresponding route_master relation.

On the route_master relation, there’s a number of fields we use:

  • ref for the (short) line name.
  • colour for the line color in #RRGGBB notation. Note the British English spelling.
  • Any of route_master, line, passenger or service to determine the mode of transportation (bus, tram, subway, rapid transit, regional or long-distance train, etc). This is unfortunately not always an exact science, the lines between those can be blurry, and there are various exotic special cases (e.g. the San Francisco cable cars or the Wuppertal Schwebebahn). See the OSM wiki for details.

Editing these fields (called “tags” in OSM speak) is fairly straightforward, compared to setting up entire routes from scratch, and most of the time is all that’s needed.

Wikidata

Next to the information in OSM, we ideally have two related items in Wikidata for each line:

  • One item representing the line, an instance of of a railway line, tram line, etc, or even better, a more specific subclass thereof. Q99691 in the above example.
  • One item representing a set of lines belonging to the same “product”, ie. typically a network of lines of the same mode of transportation in a given city. This is a often an instance of rapid transit, tram system, etc, or anything in that type hierarchy. Q68646 for the Berlin subway network in our example.

On those items we are particularly interested in the following properties:

Linking Wikidata and OpenStreetMap

Once we have elements in both OSM and Wikidata we still need to link them together. Sometimes that’s even all that’s missing, and is therefore particularly easy to contribute.

On the OSM side, there is a wikidata tag that should be set on the route_master relation and contain the Wikidata item identifier (eg. Qxxx) of the item representing that line. In some cases links to the product item exist instead, not ideal but better than nothing, if there are no per-line items in Wikidata.

On the Wikidata side, there is the OSM relation ID (P402) property that should be set on the item representing a line, and contain the relation number from the OSM side (58767 in our example).

Conflicting Information

Even with everything already set up and linked properly, reviewing the individual information is useful. Since some of the information are duplicated between Wikidata and OSM they might go out of sync. Particularly prone to that seem to be colors and the mode of transportation.

In our example this is indeed the case for the color (at the time of writing):

  • OSM has it set to #52B447
  • Wikidata has it set to #65BD00

At least both some similar shade of green, but sometimes the differences are much more significant than this.

Line Logos

The visually most impactful part are the line and product logos. Those are Wikimedia Commons files referenced via the logo image (P154) property in Wikidata. In our example that’s Berlin_U1.svg and U-Bahn.svg.

KPublicTransport’s code is looking for the following criteria before considering a logo for use:

  • The file type should be SVG or PNG. SVG is preferred, as that has the best chance to scale down reasonably when displayed in an application.
  • The file size should not exceeding 10kB, as we need to download this on demand, likely on a mobile data link. For the commonly fairly simply structured logos done in SVG this is rarely a problem, but there are files out there in the multi-100kB range as well that could benefit from optimizations.
  • A license that does not require individual attribution, such as CC0. That is the majority, however a number of files use variations of the CC-BY license family as well. Sadly we cannot use those, as providing appropriate individual attribution to the author is simply not practical for every tiny icon we display in a list somewhere.

When adding new logos, or improving exist ones, those are probably criteria you want to keep in mind. In cases the logos changed over time, this can be modelled in Wikidata as well. In those cases consider adding start time (P580) and end time (P582) qualifiers to the logo property (if known), or to the very least ensure that the current variant of the logo has preferred rank (the little up/down arrows on the left side).

Contribute!

Reviewing, fixing and completing these information around your home or other places of interest to you is a very easy way to help, all you need is a web browser. Doing this will directly improve the experience of using KDE Itinerary or KTrip for yourself and others, as well as helping everyone else who is building things with this data.