OpenStreetMap

Checking OSM Data Completeness

Posted by feyeandal on 16 July 2020 in English.

If you have a bunch of geospatial data (e.g. buildings, banks, hospitals, schools, etc.) and you want to ensure that the tags/attributes on each of your data is “complete” before uploading it to OSM, you may want to check its data completeness first. To assess whether your data is “complete”, make sure you have a list of tags that you expect from your data.

Sample houses data opened in JOSM

I wrote a simple python script to check the data completeness of a specific dataset.

Generally, the script opens and reads a geojson file and uses a json schema that allows us to annotate and validate every feature of the geojson file. The defined json schema is the set of attributes you expect from your data. If all of these are met, then the object is tagged as valid (complete), otherwise it is tagged as invalid (incomplete). The output from the function is then saved to a new csv file.

Sample schema (groups of tags) I set to assess the completeness of the input file

In my defined schema, each of the key has a type requirement (string, number/value) and a defined set of expected values. I also added a required keyword to indicate the required fields for a specific feature to be tagged as valid/complete.

Sample output from the script: indicating which objects are valid or invalid

From the output file, you can now easily check which nodes need to be fixed! Now, let’s take a quick look at node #7 which turned out to be invalid. You may convert your geojson file to a csv format file so you can easily visualize the data (into a table form).

Node 7 tagged as invalid

As seen from the screenshot, there is a missing value for the key building. Since from the json schema in the script, I set all fields as required, no column must have null value. Thus, node #7 was tagged as invalid.

Node 8 tagged as invalid

Another example here is that the value for the building:levels is in string format. As I set the building:levels format to number/value in the json schema, this node was tagged as invalid.

So, there! I hope you find this script also useful for your validation.

Discussion

Comment from CYDD on 16 July 2020 at 11:52

Great one @feyeandal. I will try to play with this code and extend it futher.

Log in to leave a comment