postgis for beginners
UPDATE: Shortly after submitting this post, up popped another from Paul Ramsey that does a really good job of explaining why things are done they way they are. I recommend you read it!
Via Paul Ramsey, this post popped onto my radar the other day and got me thinking. My initial impressions about the post were quite negative, and to be honest some of the points still mystify me, but after further investigation, at least some of the issues do make sense, so perhaps there is some room for improvement in our favourite spatial database. If you haven’t read it, do that and come back. I’ll wait…
Let’s take the first statement- that PostGIS can be “mind-numbingly difficult to install and use”. The author of the post is mainly talking about Ubuntu servers, so I think it would be fair to assume some level of IT literacy here. If you go off and do a web search for “Ubuntu PostGIS”, which these days is likely to be the new user’s first port of call, then for me at least the first few links are generally to old blog posts explaining, often with long lists of commands, how to install on say, Ubuntu 7.10 or 8.04, and then there’s some posts about compiling it from source with the latest version of PostGIS. There are some links to information about modern versions of Ubuntu as well, but they are not top of the list. Most links also say to add another ppa to your list of repositories, which is fairly standard for Linux but if you approach this with a new-user mindset then it’s not very reassuring. So I went looking to see if PostGIS was in the official Ubuntu repositories, and it is- but you’d have to go looking to find it. So, I wouldn’t go as far as saying mind-numbingly difficult, but as a new user you could end up making things a lot more complicated than they need to be. I’m not sure what the solution is though- manipulate google search results so that the official Ubuntu repositories appear at the top?
Buried amongst this discussion is a point about back-porting fixes or patches to prior stable releases. This might happen, and I can see a scenario where it could be a pain for a system maintainer, but since when was this a problem just with PostGIS, or FOSS in general? It’s just as prevalent, if not worse, in the proprietary software world.
Onto import and export. I tend to agree here, up to a point, that trying to use shp2pgsql at the command line as a new user is not easy. However, every time I’ve done an install recently, I’ve been asked whether I want to include the shp2pgsql loader for PgAdmin, or indeed I’ve just gone with the import tool that comes with QGIS. So using the command line for a simple import and export is not really necessary at all. Anyway, yes, please get the srid right rather than auto-populating it with -1, and please use the (sanitised) shape file name as the default table name. Yes, make it easier to get a csv file with an easting and a northing in it, rather than making users go to ogr2ogr and learn another command line syntax.
However, we then get on to the section that I most vehemently disagree with. The general premise seems to be that the end user should not worry their pretty little head about the database they are importing into, or the user that they are using to do it, or the spatial reference system that the data comes with. Automating all of these processes might make it easier for the end user, but at the expense of them actually understanding what they are doing. The minute that this clever automated process fails, or puts the data somewhere you didn’t expect, then you can be sure that a lot of end users will decide that “open source is rubbish, where’s my ArcGIS”. Been there, seen that. Teaching people to press buttons without thinking leads to rubbish output- be that in open source, proprietary, gis, or any other software.
Forward-compatibility of backups. I’m such a fan of the inherent future-proofing and openness of a plain-text SQL dump, and I’ve never hit an issue with upgrades if I follow the instructions, so this surprised me. However, trying to come at this from a new user’s perspective, sometimes it’s not straightforward. However, progress and improvement of software means it’s not always possible to totally guarantee compatibility between versions. Again, this is not the sole province of open source- how many times have you had issues with a doc to docx conversion in Microsoft Office, for example? (Answer- many)
Geometry invalidity- yes, I see this all the time. Client sends data in mapinfo or shape format. We load it into PostgreSQL, it croaks. Sometimes the “really small buffer” trick, or similar, works, sometimes we have to go back to the client to ask them to resolve the issue. Again, I would rather do this, and enforce ideas of data validity and quality, rather than dumb things down so we never have to think about what we’re doing.
Finally, it’s worth remembering that PostGIS is a spatial extension, it’s not a program in it’s own right. Comparing it to Arc Toolbox is like comparing a car engine with a complete car. Amazingly, the whole article was written without a single mention of PgAdmin, or QGIS, or any other interface that will work with PostGIS, and provide users with a lot of the bells and whistles that the author is asking for.
In conclusion- I reluctantly find that I agree with the points around installation. It could be easier- at least to find documentation. Import into postgresql could be easier, or you can use QGIS. But please, don’t make it so easy that users don’t have to think!