basecodeit.com
How to calculate the size of the WHOLE Internet with AWS EMR and Apache Spark
Processing petabytes of data in a couple of hours without expending a fortune The idea Well... actually, a sample of the internet, thanks to http://commoncrawl.org/ The idea is to use a Spark cluster provided by AWS EMR, to calculate the average size of a sample of the internet. We'll do