OpenStreetMap logo OpenStreetMap

History of all Tags

Posted by tyr_asd on 31 August 2016 in English. Last updated on 1 September 2016.

TL;DR: head over to http://taghistory.raifer.tech/ for usage graphs of arbitrary OSM tags over time (by number of OSM objects).

In OpenStreetMap, tags define what an object is. Whether it is a mountain, a river, a house, or a postbox: Every map feature has it’s own tag (or set of tags).

OSM doesn’t have a fixed set of object categories. Over time, a more and more faceted and diverse set of features got mapped in OSM, thus the amount of different tags grew. At the same time, sometimes, tagging of a specific thing changes: Features that used to be mapped with one tag, get newer, better and more refined tags. That’s OpenStreetMap evolving.

Of course, OpenStreetMap is also still growing, but not all the tags are getting more widely used at the same pace: For example, while it’s quite possible that most of the world’s railway stations are already mapped in OSM, there are still many juicy pastures left to be mapped out there.

a friendly goat

While there exist superb tools to get to know about the current state of all tags used in OSM (Taginfo most notably, but also the Overpass API to some extend), until now it was quite difficult to get oneself a good picture of the data evolution process. For example, questions like: from when on a specific tag was getting used, when an obsoleted tag got taken over by a different one or which tags got more traction lately are difficult questions to answer with OSM’s current tool set.

For some of these questions, people programmed their own solutions, each answering their own question, like how many km’s of Italy’s roads were there in OSM over time (link), or how many buildings have been mapped in Austria (link). Similarly, the OSM-Analytics platform has recently started to provide such statistics for arbitrary regions for a limited set of map features (currently one can choose between buildings and roads, but there are plans to add more in the near future). What all of those tools have in common is that they can’t handle the full variety of tags that’s so essential in OSM.

To step into the gap between tools like taginfo (where the full variety of OSM’s tags is so beautifully visible – stay tuned for Jochen’s talk on SOTM in a couple of weeks!) and the more specialized tools like osm-analytics, I’ve created taghistory which allows one to get a historical usage graph for each of OSM’s tags (with daily granularity) and to compare different tags against each other:

highway=ford vs. ford=yes

The tool is currently in it’s very early stage, the’re many things to do and improvements to be done. It’s also important to note that the historical usage of a tag is currently only defined as a the respective number (count) of OSM objects! That’s similarly to the statistics produced by taginfo, this metric is subject to the some limitations, most notably the effect that one cannot directly compare the number of tags used for different linear and polygonal features such as roads, land cover, etc. because such features are typically divided up into many OSM objects of different sizes. For example, an existing road may be divided up into two pieces when a new turn restrictions is added, resulting in that the count of each of the tags used on the road (even obsolete ones) is increased by one in the OSM database. That means that one needs to pay close attention when comparing tags that are typically used on such features, even when comparing subtags that are typically used on the same kind of parent object (e.g. different values of the highway tag).

That being said, have lot’s of fun while digging into the depths of OSM tags’ history. Here’s the link of the tool again: http://taghistory.raifer.tech/ (and the link to the project’s source code repository and issue tracker: https://github.com/tyrasd/taghistory). What’s your favourite tag? I find the created_by graph quite interesting:

history of the usage of the created_by tag

Discussion

Comment from mvexel on 31 August 2016 at 21:33

Very cool!! Interesting to explore the data that way. It’s fun to try and recreate what happened, for example here:

centerturnlane

My guess: * People started mapping ways with center_turn_lane=yes * Someone decided that those tags needed to go away and wrote a bot * Mappers decided to use it anyway :)

Oh but wait:

center-centre

New guess: * Someone decided that the correct spelling was centre_turn_lane and wrote a bot to rename the tags * Mappers decided to use center_turn_lane anyway :)

Comment from tyr_asd on 1 September 2016 at 06:58

Math1985 has more interesting examples on his osm diary page: http://www.openstreetmap.org/user/Math1985/diary/39404

Comment from Alecs01 on 1 September 2016 at 19:00

Excellent tool, thanks!

Comment from d1g on 10 September 2016 at 14:43

tyr, I had idea to include Google results with “amenity=public_building” query

e.g.

Full timeline:

2006-03-24 wiki: amenity=public_building added to map features

2007-10-16 JOSM: amenity=public_building added

2015-11-15 JOSM: office=administrative, office=government added

2016-03-02 wiki: office=administrative and amenity=public_building deprecated

2016-04-01 JOSM: amenity=public_building dropped and deprecation warning added

should be drawn as vertical lines with number. Where every number is linked to external resource to see if there any mistakes during discussion.

We (or Math1985) shouldn’t fiddle with wiki or any other resource to see when tag was added/removed/mentioned for the first time.

We definitely need such tool.

Comment from d1g on 10 September 2016 at 15:08

For example, “payment:troika” was discussed deep in the public trasport thread http://forum.openstreetmap.org/viewtopic.php?pid=596732#p596732

http://taginfo.openstreetmap.org/keys/payment%3Atroika#overview

There no discussions of it at tagging list or at wiki or anywhere else in OSM.

Comment from d1g on 10 September 2016 at 15:25

… but wait, if you search more, “payment:troika” is used in OsmAnd already https://github.com/osmandapp/OsmAnd-resources/commit/996c7727287ebadbea0919d83be4a2d4fa8adccc#diff-bc091b281dee9cb9288fad5990fe5538

and was discussed in some more minor discussions at other channels

Comment from GRUBERND on 15 September 2016 at 19:04

lovely tool. i guess you are using stats from a database analysis. how about counting the nodes associated to way/polygon objects instead of the objects themselves? this would totally eliminate the statistical jumps through splitting, merging and other operations.

Comment from tyr_asd on 15 September 2016 at 21:11

Funny idea, that could indeed partially improve the issue with split ways. Still, a proper solution would have to track the actual length and/or area of the respective objects.

Comment from Jojo4u on 30 September 2016 at 12:24

Under which licence do the generated charts stand?

Comment from tyr_asd on 2 October 2016 at 10:05

@Jojo4u: You’re free to do everything you want with the generated charts as long as you comply with ODbL’s minimal requirement for produced works, i.e. citing OSM as the data source. A link back to this blog article and/or the website taghistory.raifer.tech is very much appreciated, though. :)

Comment from joost schouppe on 9 November 2016 at 14:25

Would it be hard to implement permalinking to the charts one makes?

Comment from tyr_asd on 10 November 2016 at 17:14

@Joost, probably not too hard. There’s already a ticket on github for that, where any progress will be documented: https://github.com/tyrasd/taghistory/issues/6

Comment from Polarbear on 16 December 2016 at 21:30

How often is the demo site http://taghistory.raifer.tech/ updated? It seems to be stuck at some time in October or so?

Comment from tyr_asd on 17 December 2016 at 22:48

Sorry, currently, there’s no updates! :( I’ve been looking into doing updates via Overpass’ augmented diff, but I’ve run into some issues which need to be resolved upstream before it can work (see links in https://github.com/tyrasd/taghistory/issues/10). The alternative of reprocessing the history dump every week or month is currently also not really an option because of my limited computing resources.

Comment from SafwatHalaby on 24 October 2017 at 14:52

@tyr_asd have you considered processing daily planet diffs?

Comment from tyr_asd on 25 October 2017 at 09:49

@SafwatHalaby: yes, but regular planet diffs don’t contain all necessary data to keep this kind of data up to date (because they don’t include the tags the modified osm objects had before the diff). Overpass’ augmented diffs could in principle work, but they have other technical issues, see: https://github.com/tyrasd/taghistory/issues/10

Comment from SafwatHalaby on 25 October 2017 at 11:30

I think in theory you can have all the info. - An initial OSM DB populated from a planet file starting from where you’ve ceased updating your current data - When a new diff arrive, the old state of each node is in your OSM DB, and the new state can be extracted from the diff. These are used to update the statistics. - Apply the diff to the OSM DB planet file, and now you can repeat the process the next day.

I don’t know how easy or complicated that’d be in practice.

Comment from tyr_asd on 25 October 2017 at 18:40

Yes, sure. But one of my main design goals for this tool was to avoid having to set up and run any kind of database containing the full OSM data, so that’s not really an option for me, unfortunately.

But if anyone out there already runs a (daily) updated OSM DB which could produce deltas of the counts of tags in their db (for little extra processing cost), please contact me – I’d love to use that data for updating the taghistory service.

Comment from dieterdreist on 22 June 2018 at 09:44

Martin, this is such a great tool, I am using it all the time! Thank you very much also for the recent update (because outdated data makes it less useful, obviously). Would it be complicated to add a permalink function? (one single tag / key would already by much better than nothing). It would make it easier to share findings with others, where space is limited (e.g. mailing lists don’t like picture attachments).

Comment from marc__marc on 7 August 2018 at 22:27

@tyr_asd did you have a POC to test your idea of a weekly/daily run ? - what’s the memory requirement and cpu usage for the current script ? - with an already minutly-updated OSM DB, what would be the little extra processing cost to produce deltas of the counts of tags you need ? did you have alreay the script or need to create it ? - I like the idea of using taginfo to avoid the need to parse twice the same file for the same king of info… but I understand that this may need additional dev time

Comment from Grillo on 12 August 2019 at 22:17

Since mid 2018 this tool doesn’t seem to work properly anymore, as in no exact numbers are given. Any ideas why?

Comment from MalgiK on 6 February 2020 at 17:52

Thanks a lot of adding the perma-link functionality :-)

Comment from mueschel on 17 July 2020 at 17:47

Could we get another update of the database? I like the tool, but current data is already a year old.

For regular updates: You can get the total numbers of each key from Taginfo. The API provides a convenient way to download one table with all ~80k keys currently in use. You can’t query the past history, but it should be perfectly fine for a daily or weekly update of numbers.

Comment from DaveF on 2 November 2020 at 21:05

Hi Another request to update the database as I’ve found this to be a useful tool. Would it take much time to incorporate Taginfo’s data?

Comment from tyr_asd on 16 November 2020 at 11:19

Taginfo now also features historic development data in its new “chronology” tag (see https://blog.jochentopf.com/2020-11-08-10-years-of-taginfo.html) now. It’s limited to tag keys and the “most frequent tags”, but I think this should already solve most needs for current tag count statistics (and for the rest one can still use https://api.ohsome.org). PS: Taghistory’s web interface is updated now to also fetch taginfo’s chronology data if available.

Comment from Matija Nalis on 13 May 2023 at 22:07

So, does that “Taghistory’s web interface is updated now to also fetch taginfo’s chronology data if available” mean that taghistory now too only works for “most frequent tags”?

I.e. https://github.com/tyrasd/taghistory/issues/34

Comment from tyr_asd on 14 May 2023 at 08:50

@Matija Nalis, well, it still shows the history for all tags up to some time in 2018. But I guess that after 5 years the added usefulness of that partial data is quite limited.

Log in to leave a comment