strapdata.com
Fast Spark aggregation with Elassandra
Apache Spark is often used to aggregate data from various datastores like Apache Cassandra, Elasticsearch or from HDFS files. Columnar storage like Elasticsearch, Apache SOLR or Parquet file format are well-known to be very efficient to filter and aggregate data, but unfortunately, Spark does not yet push down aggregation queries (see SPARK-12686), and recompute aggregation […]