Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Wednesday, October 2 • 3:10pm - 3:20pm
Lightning Talk: ETL Validation with Cascading on Elastic MapReduce

Sign up or log in to save this to your schedule and see who's attending!

Cascading is a Java application framework that runs atop Hadoop. With its distinction between “data integration” and “data processing” (i.e., where to find your data and what to do with it) Cascading lends itself to developing testable components at both the unit- and integration-test levels. To explore the Cascading framework and howchange.org has come to find it a very useful abstraction layer over Hadoop, I'll run through an example application that we have built to run on Elastic MapReduce as a verification component in our ETL pipeline, with a focus on how the Cascading framework has enabled us to have a continuous integration-backed development/deployment model that affords us a level of security even when dealing with billions of rows of mission-critical data. 

Speakers
VR

Vijay Ramesh

Software Engineer, Data Science, Change.org
In the land of the night the ship of the sun is drawn by the Grateful Dead


Wednesday October 2, 2013 3:10pm - 3:20pm
Fort Mason Center 2 Marina Blvd San Francisco, CA 94123