On July 15, Grid Dynamics invites Data Engineers to join the online meetup “Spark ML Pipelines [Under the Hood]” as part of Dynamic Talks.
The speaker is Vitalii Monastyrev, a Big Data Engineer at Grid Dynamics. The meetup will be held in Russian.
About the talk:
Modern IT companies actively develop the Data Science stack in their projects to predict profits for subsequent quarters, configure targeted advertising, build a recommendation system, and much more.
Quite often, the data used to build Machine Learning models weighs hundreds of gigabytes or more. In this case, many questions often arise:
- How to work with so much data?
- How to generate features?
- How to train models?
- How to integrate work between Data Engineer and Data Science teams?
The target audience
Data Engineers who have previously worked with Apache Spark or understand the basic logic of its operation. Knowledge of Spark ML is unnecessary since the library will be considered part of the report.