Spark is a new up-and-coming open source distributed computing framework from the UC Berkeley AMPLab. By using a clever abstraction called an RDD (resilient distributed dataset), it is able to very elegantly unify the batch and streaming worlds into a single comprehensive framework.
Originally built to solve distributed machine learning problems, Spark has quickly proven to also be the Swiss-Army knife of Bigdata. Companies latching onto the Bigdata movement are able to store mounds of data, but are still stuck with one very perplexing problem: extracting business intelligence is extremely difficult, even with existing tools that sit on top of Hadoop. Spark helps solve this conundrum by providing a very rich, accessible, and expressive API that makes working with Bigdata a breeze.