High Resolution Population Density Maps and Demographic Estimates
Methodology
ENTER THESE DETAILS IN THE AWS CONSOLE’S FOUR-STEP ADD TABLE WIZARD:
- Name & Location
- Choose a database and table name such as “hrsl”
- Location of data: s3://dataforgood-fb-data/demographic_csvs
2. Data Format
- HRSL data files are served as comma-separated values (CSV)
3. Columns
Data files include three columns:
- latitude (float)
- longitude (float)
- demographic_category (float)
4. Partitions
One partition is defined:
- type (string) – demographic groups, one of “men”, “women”, “children_under_five”, “elderly_60_plus”, “women_of_reproductive_age_15_49”, or “youth_15_24”
ISO
After completing the four steps in the wizard to run a pre-populated CREATE TABLE query, look for a “Query successful” note at bottom and follow the included link to “load all partitions”.
Load all HRSL partitions. This will take approximately 2 minutes. When it’s successful, the table listing in the left column includes longitude, latitude, and population floating point columns and month, country, and type table partitions.
Query data in Athena using standard SQL. For example, to request the total population of Zimbabwe, sum the count of men and women with country code “ZWE”:
SELECT min(latitude) AS min_lat,
min(longitude) AS min_lon,
max(latitude) AS max_lat,
max(longitude) AS max_lon,
cast(sum(population) as integer) AS population,
country
FROM hrsl
WHERE month='2019-06'
AND country='ZWE'
AND type IN ('men', 'women')
GROUP BY countryAthena can perform more complex geospatial queries. To retrieve the total population in an area surrounding Lake Victoria, start by defining a freehand polygon at geojson.io and exporting it as a WKT string.
Use the ST_Polygon and ST_Point constructors to create two geometries and compare them with ST_Within to select only population samples inside the designated area above:
SELECT min(latitude) AS min_lat,
min(longitude) AS min_lon,
max(latitude) AS max_lat,
max(longitude) AS max_lon,
cast(sum(population) as integer) AS population,
country
FROM hrsl
WHERE month='2019-06'
AND type IN ('men', 'women')
AND ST_Within(ST_Point(longitude, latitude), ST_Polygon('POLYGON ((30.454 3.294, 28.959 0.659, 28.125 -3.294, 29.838 -8.754, 33.574 -10.141, 39.726 -8.581, 41.835 -2.021, 38.232 3.995, 32.827 4.390, 30.454 3.294))'))
GROUP BY countryThis query returned covered populations in six countries after 92 seconds and cost $0.01 for 2.97GB of data transferred. Add a list of interesting countries to the query to reduce the cost to 23 seconds and $0.0008 for 167MB of data:
SELECT min(latitude) AS min_lat,
min(longitude) AS min_lon,
max(latitude) AS max_lat,
max(longitude) AS max_lon,
cast(sum(population) AS integer) AS population,
country
FROM hrsl
WHERE month='2019-06'
AND type IN ('men', 'women')
AND country IN ('UGA', 'COD', 'ZMB', 'BDI', 'RWA', 'TZA')
AND ST_Within(ST_Point(longitude, latitude), ST_Polygon('POLYGON ((30.454 3.294, 28.959 0.659, 28.125 -3.294, 29.838 -8.754, 33.574 -10.141, 39.726 -8.581, 41.835 -2.021, 38.232 3.995, 32.827 4.390, 30.454 3.294))'))
GROUP BY country