Category Archives: Platform

Analytics Stack

With new technologies coming out so fast for analytics, its hard to keep up with the best tool for the job. Take Berkely’s Data Analytics Stack (BDAS) featuring Spark, Shark, Mesos, for advanced analytics and mining. Should I use this or stick with Apache Hadoop, Hive, and Mahout? How do you decide? From my experience, I’ve found this to be the most common stack:

Configuration:

  • Hadoop: for distributed file system for data collection.
  • Database: Hbase or Cassandra to enable random reads
  • Analysis: Hive, Pig, Impala for advanced analysis
  • Real-Time: Storm or Spark
  • Visualization: Tableau Software or if you have programmers D3.JS
  • Applications: Datameer, Alpine Data Labs, WibiData, Wise.io, others?
  • Infrastructure: On-premise or Hosted?
  • Add-ons: Hue, Sqoop, and Flume.
Analytics, Big Data, Stack

Example of a possible configuration

Is this generally what you see? Are there additional configuration I am missing? Feel free to leave a comment or contact me directly.

Protected: What Makes a Good Analytics Platform?

This content is password protected. To view it please enter your password below: