Back To Schedule
Wednesday, October 2 • 3:10pm - 3:20pm
Lightning Talk: ETL Validation with Cascading on Elastic MapReduce

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Cascading is a Java application framework that runs atop Hadoop. With its distinction between “data integration” and “data processing” (i.e., where to find your data and what to do with it) Cascading lends itself to developing testable components at both the unit- and integration-test levels. To explore the Cascading framework and howchange.org has come to find it a very useful abstraction layer over Hadoop, I'll run through an example application that we have built to run on Elastic MapReduce as a verification component in our ETL pipeline, with a focus on how the Cascading framework has enabled us to have a continuous integration-backed development/deployment model that affords us a level of security even when dealing with billions of rows of mission-critical data. 


Vijay Ramesh

Software Engineer, Data Science, Change.org
In the land of the night the ship of the sun is drawn by the Grateful Dead

Wednesday October 2, 2013 3:10pm - 3:20pm PDT
Fort Mason Center 2 Marina Blvd San Francisco, CA 94123