Table of Contents
  1. Preface
    1. Audience
    2. How This Book is Organized
    3. Supporting Books
    4. Code Examples
    5. Early Release Status and Feedback
  2. 1. Introduction to Data Analysis with Spark
    1. What is Apache Spark?
    2. A Unified Stack
      1. Spark Core
      2. Spark SQL
      3. Spark Streaming
      4. MLlib
      5. GraphX
      6. Cluster Managers
    3. Who Uses Spark, and For What?
      1. Data Science Tasks
      2. Data Processing Applications
    4. A Brief History of Spark
    5. Spark Versions and Releases
    6. Spark and Hadoop
  3. 2. Downloading and Getting Started
    1. Downloading Spark
    2. Introduction to Spark’s Python and Scala Shells
    3. Introduction to Core Spark Concepts
    4. Standalone Applications
      1. Initializing a SparkContext
    5. Conclusion
  4. 3. Programming with RDDs
    1. RDD Basics
    2. Creating RDDs
    3. RDD Operations
      1. Transformations
      2. Actions
      3. Lazy Evaluation
    4. Passing Functions to Spark
      1. Python
      2. Scala
      3. Java
    5. Common Transformations and Actions
      1. Basic RDDs
        1. Transformations
        2. Element-wise transformations
        3. Pseudo Set Operations
        4. Actions
      2. Converting Between RDD Types
        1. Scala
        2. Java
        3. Python
    6. Persistence (Caching)
    7. Conclusion
  5. 4. Working with Key-Value Pairs
    1. Motivation
    2. Creating Pair RDDs
    3. Transformations on Pair RDDs
      1. Aggregations
        1. Tuning the Level of Parallelism
      2. Grouping Data
      3. Joins
      4. Sorting Data
    4. Actions Available on Pair RDDs
    5. Data Partitioning
      1. Determining an RDD’s Partitioner
      2. Operations that Benefit from Partitioning
      3. Operations that Affect Partitioning
      4. Example: PageRank
      5. Custom Partitioners
    6. Conclusion
  6. 5. Loading and Saving Your Data
    1. Motivation
    2. Choosing a Format
    3. Formats
      1. Text Files
      2. JSON
      3. CSV (Comma Separated Values) / TSV (Tab Separated Values)
      4. Sequence Files
      5. Object Files
      6. Hadoop Input and Output Formats
        1. Protocol Buffers
      7. Hive and Parquet
    4. File Systems
      1. Local/"Regular” FS
        1. Amazon S3
      2. HDFS
    5. Compression
    6. Databases
      1. Elasticsearch
      2. Mongo
      3. Cassandra
      4. HBase
      5. Java Database Connectivity (JDBC)
    7. Conclusion
  7. About the Authors
  8. Copyright
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Learning Spark(目录)】(https://www.iteblog.com/learning-spark-table-of-contents/)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!
(1)个小伙伴在吐槽
  1. 谢谢分享

    xiaogang08052014-12-30 13:16 回复