Online meetup “Spark ML Pipelines [Under the Hood]” for Data Engineers

Online meetup “Spark ML Pipelines [Under the Hood]” for Data Engineers

Participation is free but registration is required


On July 15, Grid Dynamics invites Data Engineers to join the online meetup “Spark ML Pipelines [Under the Hood]” as part of Dynamic Talks.

The speaker is Vitalii Monastyrev, a Big Data Engineer at Grid Dynamics. The meetup will be held in Russian.

About the talk:

Modern IT companies actively develop the Data Science stack in their projects to predict profits for subsequent quarters, configure targeted advertising, build a recommendation system, and much more.

Quite often, the data used to build Machine Learning models weighs hundreds of gigabytes or more. In this case, many questions often arise:

  • How to work with so much data?
  • How to generate features?
  • How to train models?
  • How to integrate work between Data Engineer and Data Science teams?
The target audience

Data Engineers who have previously worked with Apache Spark or understand the basic logic of its operation. Knowledge of Spark ML is unnecessary since the library will be considered part of the report.

During the meetup, you will learn about

  • The basic Spark ML lib features
  • Ways to integrate several programming languages within a single learning process
  • Options for using Spark ML base classes to implement data processing modules
  • How to use examples that are available after the report