Pleiades by Country: Iraq and Syria

I was recently asked by a colleague how many "Iraqi and Syrian places" there are in Pleiades. Pleiades is not set up to automatically answer that kind of question; indeed we don't store modern national boundaries in the system at all. So, I thought I'd try to start a series of blog posts in which I work through getting the answer. Hopefully along the way we'll observe some useful things about the structure of Pleiades data and develop some ideas about how to exploit it and make it more useful.

First, a caveat. The structure of the Pleiades dataset is not simple, so the first rule of GIS applies here (as elsewhere): get to know your dataset before you draw conclusions from it. For Pleiades, essential reading includes: "Pleiades Data Model" and the "Pleiades Downloads" page, as well as the README file for any dataset one downloads.

First, we need data. For the Pleiades place resources, we'll grab the latest nightly "dump" file, in CSV format:

curl -O

For modern countries, we'll grab the latest version (March 2013) of the US State Department's Detailed World Polygons file, derived from the Large Scale International Boundaries (LSIB) dataset.

curl -O

If we pull both of these datasets into a GIS (I'm using QGIS) using their native geographic reference system (both are in WGS-84), our first order of business is obvious: discard Pleiades data that is not in Iraq. 

Since the State Department claims online that the LSIB boundary data is accurate to within "a couple of kilometers" and I don't have any more accurate metadata for these boundaries, I'd like to buffer the Iraq country polygon by 5 kilometers before running an intersection selection on the Pleiades data. To do this easily in QGIS (with the vector geoprocessing buffer tool), I need first to reproject both datasets into a coordinate system that uses meters, rather than degrees. Not wanting to spend to much time pondering, I did a quick web search, came across Dwayne Wilkins' Iraqi GIS Projections page, and picked the UTM 38N (WGS-84) projection, which is already provide along with so many others in QGIS. In saving off the projected version of the country boundaries, I selected just the polygon for Iraq in order to save my self a bit of time later. Here are the resulting datasets in their new cartographic projection (zoomed in a general way to southwest Asia):

Since we're now in a projection that uses meters, it's an easy matter to set the parameters in the buffer tool so that we get a 5km exterior buffer around the country polygon:

In QGIS, vector -> geoprocessing -> intersect lets us select just those Pleiades places that fall within that buffer polygon (446 point features, including at least one that certainly falls in modern Syria and another in Iran, but let's not worry about that for the moment):

It's at this point that we have to fall back on our knowledge of the Pleiades dataset. Pleiades is in the first instance a historical and archaeological gazetteer, not a database of extant archaeological sites.