urban data

I got a signature from Brent Spiner (Data from Star Trek) for my bro. I didn’t have to wait in line at all because I was very late at the convention and he was just there chilling, so I spoke with him for maybe five minutes since he had nothing else to do. Such a gem, this guy, his humor is so dry, and he seems so calm and caring. I was wearing my scarf around my face for my cosplay and he said ‘Wait what, where are you, I can’t see you!’ and I think he got onto the fact I was nervous and shy so he was very conciderate and made me feel like I didn’t have to worry at all.

I mentioned some scenes I liked from him and he chuckled and said “I get this every time, I honestly don’t remember and have to ask myself ‘was I actually in this show?’“

He was so surprised by my name (Feline) and had to know everything about it’s ‘origins’. Then we talked about cats for a good few moments. And then Karl Urban was suddenly standing behind me with his arm around my shoulder saying ‘Why are you asking this idiot for a signature? I’m way more awesome.’

And I just stood there kind of in shock (because holy shit I never saw an actual hollywood-pretty face up close, it was so unreal) and wanted to apologise but nothing came out of my mouth.

Anyways, thank you, Brent, you’re a very fine human too.

You too, Karl, you’re cocky in a lovable way.

The U.S. cities that gained the most workers over the last 12 months

One of the great things about social media is that it gives us access to data that previously didn’t exist or was difficult to collect.

Take, for example, LinkedIn’s monthly report on employment trends called the Workforce Report. They look at which industries are hiring, where people are moving for jobs, and so on. Click here for the June 2017 edition. 

Note that architecture/engineering hiring appears to be up nationally, which is usually a positive leading indicator.

I’ll leave you all to go through the report, but I did want to pull out a few of their maps and one of their takeaways. Below are maps of the cities that lost the most workers and gained the most workers over the last 12 months.

The established trend of people moving from colder northern cities to warmer amenity-rich cities seem to play out here.

That said, one of their “key insights” is that fewer workers today are moving to the San Francisco Bay Area. Since February 2017, there has been a 17% decline in the net number of workers.

They blame housing affordability (ahem, lack of supply). People are simply turning to other great cities like Seattle, Portland, Denver, and Austin. They’re growing and cheaper.

One of the other cool things about the report is that you can drill down into individual cities to see where people are moving from. I looked up Miami and Chicago just to do a quick comparison. 

Not surprisingly, Miami is seeing a significant contingent from South America. What’s interesting about this random comparison is how international Miami is and how regional Chicago is in terms of their draws.

I would love to see similar data for Canada. This is valuable stuff.

Watch on deerstalker-filmmaker.tumblr.com

This aesthetic soothes me to no end!!! Visuals produced by me using some sweet tools! music by Kyle Dixon for Stranger Things 2

#photography #film #Filmmaking #road #street #urban #city #data #vscocam #aesthetic #cyberpunk #neon #80s #light #action #VSCO #vlog #cinematography #director #synthwave #hypebeast #indie #strangerthings #London #apple #iphone7 #sun #like4like @glitcheapp (at Los Angeles, California)

Made with Instagram

A block is by no means a standard unit of measurement. Depending on the urban plan, blocks can be square or oblong, and can vary significantly in side length. For this plot, I measured the median length of downtown blocks in six cities that have regular grid layouts – that is, the dimensions of their city blocks are consistent. I’ve included blocks per mile along the outer axes, and the ratio of short to long dimension for each city. For this value, one represents a perfect square while smaller values indicate more oblong blocks.

Data source: Measurements made using Google Earth.

K-Means Clustering

Gap Minder data again, for which I will be trying to cluster countries using key variables identified in the previous weeks, namely income (transformed to be in thousands), internet usage, life expectancy, urban rate and policy score. All variables were standardised to have a mean of zero and a standard deviation of one. Observations with data missing for the key fields were dropped.

I will then test if the clusters are significantly different in terms of their HIV rates

Because I only start with 144 observations I have decided not to create a separate test and training set

I ran the cluster code with 1-9 clusters to be generated and produced a plot of the r-square values. This was pretty conclusive that 3 clusters would be a good cutoff, going to 4 would actually make it worse.

In a real task, I would examine all of these numbers of clusters to see what they produce, discuss them with other project stakeholders and see if we can generate human-friendly groupings. But to save time here I will just look at the 3-cluster solution

Looking at the 3 cluster solution we can see that cluster 3 is a group of wealthy nations with high life expectancy, high internet, high income, high urbanisation and high policy score. Cluster 2 is a cluster of poor nations with very low development as shown by the lowest levels of internet usage, life expectancy, urbanisation and policy. Cluster 1 is somewhere in between but closer to cluster 2, it is the lower-middle stage of development.

To greater examine the 3-cluster solution I need to plot it, but because more than 2-dimensional plots are very hard to interpret I need to reduce the picture to a 2D scatter plot. to do this I will use canonical discriminate analysis to reduce the 5 variables to 2 key variables I can visualise.

I have here plotted the 2 most variant canonical variables, with the points coloured by the cluster. This shows that clusters 1 and 2 are quite similar within themselves, but there is a blurred boundary between them suggesting a 2 cluster solution would be satisfactory, as the elbow curve above shows there is not a huge improvement from 2 to 3 clusters. Cluster 3 in green is more distinct but is spread over a larger area, with less inward distance than the other clusters.

Another check of the cluster group differentiation is to test an external variable and check for significant differences within it, in this case, I will test for differences in the HIV rates between the groups

A quick check of the count of high HIV countries (>2% HIV) by cluster shows that 25 of the 27 in the dataset are in cluster 2, 2 countries in cluster 1 and none are in cluster 3

The ANOVA test with Tukey test showed a statistically significant difference (p-value <0.0001), with cluster 2 being significantly different in terms of HIV rate from cluster 1 and cluster 3, illustrated in the box plot which shows much higher rates of HIV for cluster 2


LIBNAME mydata “/courses/d1406ae5ba27fe300 ” access=readonly;

DATA new;
set mydata.gapminder;
LABEL hivrate=“HIV Prevalence” incomeperperson=“Dollars per person per year”
urbanrate=“% of people living in urban areas”;

/*subsetting the data to remove nulls*/
IF hivrate ne .;
IF incomeperperson ne .;
IF lifeexpectancy ne .;
IF urbanrate ne .;
IF internetuserate ne .;
/*SAs sets the default target variable to tbe the lowest number, so if 1 is target set false to 2*/
if hivrate >= 2 then hiv_high = 1;
if hivrate < 2 then hiv_high = 2;

income_k = incomeperperson / 1000;

keep country hivrate hiv_high  income_k internetuserate urbanrate lifeexpectancy polityscore;


data new; set new;
* delete observations with missing data. do in separate data so only apply to kept columns, not all as done if in same step above;
if cmiss(of _all_) then delete;

ods graphics on;

proc standard data=new out=clustvar mean=0 std=1;
var income_k internetuserate urbanrate lifeexpectancy polityscore;

/*don’t know how many clusters we want, so run over a range*/
%macro kmean(K);
/*takes in standardised training data. note &k. to control name, nmber clusters*/
proc fastclus data=clustvar out=outdata&K. outstat=cluststat&K. maxclusters= &K. maxiter=300;
var income_k internetuserate urbanrate lifeexpectancy polityscore;


* extract r-square values from each cluster solution and then merge them to plot elbow curve;
/*over_all variable has the r clust value in it when type is filtered like that*/
data clus1;
set cluststat1;
if _type_=‘RSQ’;
keep nclust over_all;

data clus2;
set cluststat2;
if _type_=‘RSQ’;
keep nclust over_all;

data clus3;
set cluststat3;
if _type_='RSQ’;
keep nclust over_all;

data clus4;
set cluststat4;
if _type_='RSQ’;
keep nclust over_all;

data clus5;
set cluststat5;
if _type_='RSQ’;
keep nclust over_all;

data clus6;
set cluststat6;
if _type_='RSQ’;
keep nclust over_all;

data clus7;
set cluststat7;
if _type_='RSQ’;
keep nclust over_all;

data clus8;
set cluststat8;
if _type_='RSQ’;
keep nclust over_all;

data clus9;
set cluststat9;
if _type_='RSQ’;
keep nclust over_all;

data clusrsquare;
set clus1 clus2 clus3 clus4 clus5 clus6 clus7 clus8 clus9;

* plot elbow curve using r-square values;
/*Display parameters, interpol=join means connect points with line*/
symbol1 color=blue interpol=join;
proc gplot data=clusrsquare;
plot over_all*nclust;

/*examine the 3-cluster solution in greater detail
use canonical discriminate analysis to reduce dimensions*/
proc candisc data=outdata3 out=clustcan;
class cluster;
var income_k internetuserate urbanrate lifeexpectancy polityscore;

proc sgplot data=clustcan;
scatter y=can2 x=can1 / group=cluster;

/*See if there’s any significant difference in HIV rate between the clusters*/
proc sort data = outdata3; by cluster; run;

proc sql;
select cluster
 , count(*) as high_hiv
from outdata3
where hiv_high = 1
group by cluster

/*tukey test to see if difference in HIV rates between categorical cluster groups*/
proc anova data=outdata3;
class cluster;
model hivrate= cluster;
means cluster/tukey;


The nation’s largest retailer is known for sprawling suburban and rural stores. Now Wal-Mart is moving into city centers — sometimes despite strong local opposition.

NPR compiled data on the locations of Wal-Marts in three American cities. For each of these cities, we used census data to estimate what percentage of the population was within 1 mile of a Wal-Mart. In the maps below you can watch as Wal-Mart expands to reach more and more of this urban population. Ten years ago, Wal-Mart had no stores in any of these cities; today they have 20. In Washington, D.C., three additional Wal-Marts are under development, allowing us to project the retailer’s market growth into the future.

The Urban Neighborhood Wal-Mart: A Blessing Or A Curse?

Source: Wal-Mart, U.S. Census Bureau

Credit: April Fehling, Tyler Fisher, Christopher Groskopf, Alyson Hurt, Livia Labate and Ariel Zambelich/NPR

Note: All population estimates refer to block-level 2010 Census figures.

An astounding 26 percent of black males in the United States report seeing someone shot before turning 12.

Conditional on reported exposure to violence, black and white young males are equally likely to engage in violent behavior.
—  Aliprantis, Dionissi, 2014. “Human Capital in the Inner City,” Federal Reserve Bank of Cleveland, working paper no. 13-02R.

#onthisday in 1873, Boston’s Registrar issued a report on the births, marriages, and deaths in the city during 1872.  Some of the Registrar’s personal opinions may have crept into the report. Read some of these sample pages and see what you find!

Report of the City Registrar, Proceedings of the City Council, Collection 0100.001, Docket 1873-0331-G, Boston City Archives