Back to edX Analytics Pipeline
Load records into elasticsearch clusters.
edx.analytics.tasks.common.elasticsearch_load.
ElasticsearchIndexTask
(*args, **kwargs)¶Index a stream of documents in an elasticsearch index.
This task is intended to do the following: * Create a new index that is unique to this task run (all significant parameters). * Load all of the documents into this unique index. * If the alias is already pointing at one or more indexes, switch it so that it only points at this newly loaded
index.
Parameters: |
|
---|
Support executing map reduce tasks.
edx.analytics.tasks.common.mapreduce.
MapReduceJobTask
(*args, **kwargs)¶Execute a map reduce job. Typically using Hadoop, but can execute the job in process as well.
Parameters: |
|
---|
edx.analytics.tasks.common.mapreduce.
MultiOutputMapReduceJobTask
(*args, **kwargs)¶Produces multiple output files from a map reduce job.
The mapper output tuple key is used to determine the name of the file that reducer results are written to. Different reduce tasks must not write to the same file. Since all values for a given mapper output key are guaranteed to be processed by the same reduce task, we only allow a single file to be output per key for safety. In the future, the reducer output key could be used to determine the output file name, however.
Parameters: |
|
---|
Support for loading data into a Mysql database.
edx.analytics.tasks.common.mysql_load.
IncrementalMysqlInsertTask
(*args, **kwargs)¶A MySQL table that is mostly appended to, but occasionally has parts of it overwritten.
When overwriting, the task is responsible for populating some records that need to be easy to identify. There should be a one-to-one relationship between a row and the task that was used to write it. It should be straightforward to construct a where clause that selects all of the rows generated by this task.
Parameters: |
|
---|
edx.analytics.tasks.common.mysql_load.
MysqlInsertTask
(*args, **kwargs)¶A task for inserting a data set into RDBMS.
Parameters: |
|
---|
Gather data using Sqoop table dumps run on RDBMS databases.
edx.analytics.tasks.common.sqoop.
SqoopImportFromMysql
(*args, **kwargs)¶An abstract task that uses Sqoop to read data out of a MySQL database and writes it to a file in CSV format.
By default, the output format is defined by meaning of –mysql-delimiters option, which defines defaults used by mysqldump tool:
Parameters: |
|
---|
edx.analytics.tasks.common.sqoop.
SqoopImportTask
(*args, **kwargs)¶An abstract task that uses Sqoop to read data out of a database and writes it to a file in CSV format.
Parameters: |
|
---|
Import data from external RDBMS databases specific to enterprise into Hive.
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportBenefitTask
(*args, **kwargs)¶Ecommerce: Imports offer benefit information from an ecommerce table to a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportConditionalOfferTask
(*args, **kwargs)¶Ecommerce: Imports conditional offer information from an ecommerce table to a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportDataSharingConsentTask
(*args, **kwargs)¶Imports the consent_datasharingconsent table to S3/Hive.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportEnterpriseCourseEnrollmentUserTask
(*args, **kwargs)¶Imports the enterprise_enterprisecourseenrollment table to S3/Hive.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportEnterpriseCustomerTask
(*args, **kwargs)¶Imports the enterprise_enterprisecustomer table to S3/Hive.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportEnterpriseCustomerUserTask
(*args, **kwargs)¶Imports the enterprise_enterprisecustomeruser table to S3/Hive.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportStockRecordTask
(*args, **kwargs)¶Ecommerce: Imports the partner_stockrecord table from the ecommerce database to a destination directory and a HIVE metastore.
A voucher is a discount coupon that can be applied to ecommerce purchases.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportUserSocialAuthTask
(*args, **kwargs)¶Imports the social_auth_usersocialauth table to S3/Hive.
Parameters: |
|
---|
edx.analytics.tasks.enterprise.enterprise_database_imports.
ImportVoucherTask
(*args, **kwargs)¶Ecommerce: Imports the voucher_voucher table from the ecommerce database to a destination directory and a HIVE metastore.
A voucher is a discount coupon that can be applied to ecommerce purchases.
Parameters: |
|
---|
A canonical calendar that can be joined with other tables to provide information about dates.
edx.analytics.tasks.insights.calendar_task.
CalendarTableTask
(*args, **kwargs)¶Ensure a hive table exists for the calendar so that we can perform joins.
Parameters: |
|
---|
edx.analytics.tasks.insights.calendar_task.
CalendarTask
(*args, **kwargs)¶Generate a canonical calendar.
This table provides information about every day in every year that is being analyzed. It captures many complex details associated with calendars and standardizes references to concepts like “weeks” since they can be defined in different ways by various systems.
It is also intended to contain business-specific metadata about dates in the future, such as fiscal year boundaries, fiscal quarter boundaries and even holidays or other days of special interest for analysis purposes.
Parameters: |
|
---|
Import data from external RDBMS databases into Hive.
edx.analytics.tasks.insights.database_imports.
ImportAllDatabaseTablesTask
(*args, **kwargs)¶Imports a set of database tables from an external LMS RDBMS.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportAuthUserProfileTask
(*args, **kwargs)¶Imports user demographic information from an external LMS DB to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportAuthUserTask
(*args, **kwargs)¶Imports user information from an external LMS DB to a destination directory.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCouponVoucherIndirectionState
(*args, **kwargs)¶Ecommerce: Current: Imports the voucher_couponvouchers table from the ecommerce database to a destination directory and a HIVE metastore.
This table is just an extra layer of indirection in the source schema design and is required to translate a ‘couponvouchers_id’ into a coupon id. Coupons are represented as products in the product table, which is imported separately. A coupon can have many voucher codes associated with it.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCouponVoucherState
(*args, **kwargs)¶Ecommerce: Current: Imports the voucher_couponvouchers_vouchers table from the ecommerce database to a destination directory and a HIVE metastore.
A coupon can have many voucher codes associated with it. This table associates voucher IDs with ‘couponvouchers_id’s, which are stored in the voucher_couponvouchers table and have a 1:1 relationship to coupon IDs.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCourseEntitlementTask
(*args, **kwargs)¶Imports the table containing learners’ course entitlements.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCourseModeTask
(*args, **kwargs)¶Course Information: Imports course_modes table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCourseUserGroupTask
(*args, **kwargs)¶Imports course cohort information from an external LMS DB to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCourseUserGroupUsersTask
(*args, **kwargs)¶Imports user cohort information from an external LMS DB to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCurrentOrderDiscountState
(*args, **kwargs)¶Ecommerce: Current: Imports current order discount records from an ecommerce table to a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCurrentOrderLineState
(*args, **kwargs)¶Ecommerce: Current: Imports current order line items from an ecommerce table to a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCurrentOrderState
(*args, **kwargs)¶Ecommerce Current: Imports current orders from an ecommerce table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportCurrentRefundRefundLineState
(*args, **kwargs)¶Ecommerce: Current: Imports current refund line items from an ecommerce table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportEcommercePartner
(*args, **kwargs)¶Ecommerce: Current: Imports Partner information from an ecommerce table to a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportEcommerceUser
(*args, **kwargs)¶Ecommerce: Users: Imports users from an external ecommerce table to a destination dir.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportGeneratedCertificatesTask
(*args, **kwargs)¶Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportIntoHiveTableTask
(*args, **kwargs)¶Abstract class to import data into a Hive table.
Requires four properties and a requires() method to be defined.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportMysqlToHiveTableTask
(*args, **kwargs)¶Dumps data from an RDBMS table, and imports into Hive.
Requires override of table_name and columns properties.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportPersistentCourseGradeTask
(*args, **kwargs)¶Imports the grades_persistentcoursegrade table to S3/Hive.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportProductCatalog
(*args, **kwargs)¶Ecommerce: Products: Imports product catalog from an external ecommerce table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportProductCatalogAttributeValues
(*args, **kwargs)¶Ecommerce: Products: Imports product catalog attribute values from an external ecommerce table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportProductCatalogAttributes
(*args, **kwargs)¶Ecommerce: Products: Imports product catalog attributes from an external ecommerce table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportProductCatalogClass
(*args, **kwargs)¶Ecommerce: Products: Imports product catalog classes from an external ecommerce table to a destination dir.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartCertificateItem
(*args, **kwargs)¶Imports certificate items from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartCoupon
(*args, **kwargs)¶Imports coupon definitions from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartCouponRedemption
(*args, **kwargs)¶Imports coupon redeptions from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartCourseRegistrationCodeItem
(*args, **kwargs)¶Imports course registration codes from an external ecommerce table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartDonation
(*args, **kwargs)¶Imports donations from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartOrder
(*args, **kwargs)¶Imports orders from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartOrderItem
(*args, **kwargs)¶Imports individual order items from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportShoppingCartPaidCourseRegistration
(*args, **kwargs)¶Imports paid course registrations from an external LMS DB shopping cart table to both a destination directory and a HIVE metastore.
Parameters: |
|
---|
edx.analytics.tasks.insights.database_imports.
ImportStudentCourseEnrollmentTask
(*args, **kwargs)¶Imports course enrollment information from an external LMS DB to a destination directory.
Parameters: |
|
---|
Various helper utilities that are commonly used when working with Hive
edx.analytics.tasks.util.hive.
BareHiveTableTask
(*args, **kwargs)¶Abstract class that represents the metadata associated with a Hive table.
Note that all this task does is ensure that the table is created, it does not populate it with any data, simply runs the DDL commands to create the table.
Also note that it will not change the schema of the table if it already exists unless the overwrite parameter is set to True.
Parameters: |
|
---|
edx.analytics.tasks.util.hive.
HivePartitionTask
(*args, **kwargs)¶Abstract class that represents the metadata associated with a partition in a Hive table.
Note that all this task does is ensure that the partition is created, it does not populate it with any data, simply runs the DDL commands to create the partition.
Parameters: |
|
---|
edx.analytics.tasks.util.hive.
HiveTableFromQueryTask
(*args, **kwargs)¶Creates a hive table from the results of a hive query.
Parameters: |
|
---|
edx.analytics.tasks.util.hive.
HiveTableTask
(*args, **kwargs)¶Abstract class to import data into a Hive table.
Currently supports a single partition that represents the version of the table data. This allows us to use a consistent location for the table and swap out the data in the tables by simply pointing at different partitions within the folder that contain different “versions” of the table data. For example, if a snapshot is taken of an RDBMS table, we might store that in a partition with today’s date. Any subsequent jobs that need to join against that table will continue to use the data snapshot from the beginning of the day (since that is the “live” partition). However, the next time a snapshot is taken a new partition is created and loaded and becomes the “live” partition that is used in all joins etc.
Important note: this code currently does not clean up old unused partitions, they will just continue to exist until they are cleaned up by some external process.
Parameters: |
|
---|
edx.analytics.tasks.util.hive.
OverwriteAwareHiveQueryDataTask
(*args, **kwargs)¶A generalized Data task whose output is a hive table populated from a hive query.
Parameters: |
|
---|
Support URLs. Specifically, we want to be able to refer to data stored in a variety of locations and formats using a standard URL syntax.
Examples:
s3://some-bucket/path/to/file
/path/to/local/file.gz
hdfs://some/directory/
edx.analytics.tasks.util.url.
ExternalURL
(*args, **kwargs)¶Simple Task that returns a target based on its URL
Parameters: | url (Parameter) – |
---|
edx.analytics.tasks.util.url.
UncheckedExternalURL
(*args, **kwargs)¶A ExternalURL task that does not verify if the source file exists, which can be expensive for S3 URLs.
Parameters: | url (Parameter) – |
---|