Running Spark Tasks in Docker


  • Access analytics pipeline shell:

    make analytics-pipeline-shell
  • Generate egg files

    If you plan to run Spark workflows that use imports that in turn require the use of a plugin mechanism, it is necessary to store those imports locally as egg files. These imports are then identified in the configuration file in the spark section. Opaque keys is one of these imports, and the two egg files used by Spark can be made as follows.

    make generate-spark-egg-files


launch-task UserActivityTaskSpark --local-scheduler --interval 2017-03-16