Rapid and smooth
Our big data team quickly developed a liking for Apache Spark. The engine is fast in execution – both in-memory as well as for operations on hard disks – and convenient to use. No wonder that meanwhile Spark has become so widespread and is seen as the most popular big data framework alongside Hadoop. The Chinese search engine provider Baidu relies on the engine and likewise the NASA for the Deep Space Network. In order to generate rapid evaluations, we deploy Spark in big data projects with large sensor data volumes, for example.
Combinable with Hadoop
Spark is a pure engine, not a complete stack like Hadoop. Therefore Spark is often combined with Hadoop as a basic infrastructure. Hadoop then functions as a system for distributed data storage on which Spark is based. Depending on the application case, Spark can also be used without Hadoop – as in combination with NoSQL databases like Cassandra, for example.
Developed with a view to Machine Learning
Spark was originally developed in 2009 by Matei Zaharia, who was then a doctoral student at UC Berkeley. Limitations of the MapReduce approach in the context of Machine Learning algorithms and interactive queries were the starting point. In connection with a growing community, Spark became a universally applicable engine that excels above all in advanced data processing methods such as Machine Learning or Stream Processing.