Cascading is a Java application framework that runs atop Hadoop. With its distinction between “data integration” and “data processing” (i.e., where to find your data and what to do with it) Cascading lends itself to developing testable components at both the unit- and integration-test levels. To explore the Cascading framework and how
change.org has come to find it a very useful abstraction layer over Hadoop, I'll run through an example application that we have built to run on Elastic MapReduce as a verification component in our ETL pipeline, with a focus on how the Cascading framework has enabled us to have a continuous integration-backed development/deployment model that affords us a level of security even when dealing with billions of rows of mission-critical data.