Machine Learning with Apache Spark

To stay competitive, organizations have started adopting new approaches to data processing and analysis. For example, data scientists are turning to machine learning. Our Machine Learning Course offering, Apache Spark teaches you processing of massive amounts of data using Apache Spark�s distributed compute capability and its built-in machine learning library. This intensive Machine Learning Course, Apache Spark training, provides an overview of data science algorithms as well as the theoretical and technical aspects of using the Apache Spark platform for Machine Learning. This training course is supplemented by a variety of hands-on labs that help attendees reinforce their theoretical knowledge of the learned material.


    • Understand what Big Data is
    • Know the difference between “data-at-rest” and “data-in-motion”
    • Understand what map-reduce / Hadoop is, and what it can do
    • Be aware of query technologies for easily querying with Hadoop (e.g. Hive, Pig, and others)
    • Understand what NoSQL databases are and what they can do
    • Become familiar with the choices in the NoSQL landscape
    • Understand the strengths and weaknesses of different NoSQL technologies
    • Be well-informed on your choices in Big Data processing, and evaluate them for your needs

    Session 1: Machine Learning Algorithms

    • Supervised vs Unsupervised Machine Learning
    • Supervised Machine Learning Algorithms
    • Unsupervised Machine Learning Algorithms
    • Choose the Right Algorithm
    • Life-cycles of Machine Learning Development
    • Classifying with k-Nearest Neighbors (SL)
    • k-Nearest Neighbors Algorithm
    • k-Nearest Neighbors Algorithm
    • The Error Rate
    • Decision Trees (SL)
    • Random Forests
    • Unsupervised Learning Type: Clustering
    • K-Means Clustering (UL)
    • K-Means Clustering in a Nutshell
    • Regression Analysis
    • Logistic Regression
    • Summary

    Session 2. Introduction to Functional Programming

    • What is Functional Programming (FP)?
    • Terminology: Higher-Order Functions
    • Terminology: Lambda vs Closure
    • A Short List of Languages that Support FP
    • FP with Java
    • FP With JavaScript
    • Imperative Programming in JavaScript
    • The JavaScript map (FP) Example
    • The JavaScript reduce (FP) Example
    • Using reduce to Flatten an Array of Arrays (FP) Example
    • The JavaScript filter (FP) Example
    • Common High-Order Functions in Python
    • Common High-Order Functions in Scala
    • Elements of FP in R
    • Summary

    Session 3. Introduction to Apache Spark

    • What is Apache Spark
    • A Short History of Spark
    • Where to Get Spark?
    • The Spark Platform
    • Spark Logo
    • Common Spark Use Cases
    • Languages Supported by Spark
    • Running Spark on a Cluster
    • The Driver Process
    • Spark Applications
    • Spark Shell
    • The spark-submit Tool
    • The spark-submit Tool Configuration
    • The Executor and Worker Processes
    • The Spark Application Architecture
    • Interfaces with Data Storage Systems
    • Limitations of Hadoop’s MapReduce
    • Spark vs MapReduce
    • Spark as an Alternative to Apache Tez
    • The Resilient Distributed Dataset (RDD)
    • Spark Streaming (Micro-batching)
    • Spark SQL
    • Example of Spark SQL
    • Spark Machine Learning Library
    • GraphX
    • Spark vs R
    • Summary
    Session 4. The Spark Shell

    • The Spark Shell
    • The Spark Shell UI
    • Spark Shell Options
    • Getting Help
    • The Spark Context (sc) and SQL Context (sqlContext)
    • The Shell Spark Context
    • Loading Files
    • Saving Files
    • Basic Spark ETL Operations
    • Summary

    Session 5. The Spark Machine Learning Library

    • What is MLlib?
    • Supported Languages
    • MLlib Packages
    • Dense and Sparse Vectors
    • Labeled Point
    • Python Example of Using the LabeledPoint Class
    • LIBSVM format
    • An Example of a LIBSVM File
    • Loading LIBSVM Files
    • Local Matrices
    • Example of Creating Matrices in MLlib
    • Distributed Matrices
    • Example of Using a Distributed Matrix
    • Classification and Regression Algorithm
    • Clustering
    • Summary

    Session 6. Text Mining

    • What is Text Mining?
    • The Common Text Mining Tasks
    • What is Natural Language Processing (NLP)?
    • Some of the NLP Use Cases
    • Machine Learning in Text Mining and NLP
    • Machine Learning in NLP
    • TF-IDF
    • The Feature Hashing Trick
    • Stemming
    • Example of Stemming
    • Stop Words
    • Popular Text Mining and NLP Libraries and Packages
    • Summary

    Data Scientists, Business Analysts, Software Developers, IT Architects

    Participants should have the general knowledge of statistics and programming.

    Course Reviews


    • 5 stars0
    • 4 stars0
    • 3 stars0
    • 2 stars0
    • 1 stars0

    No Reviews found for this course.

    © Euler. All rights reserved.
    Contact Us to Register