Table of Contents
- Preface
- Audience
- How This Book is Organized
- Supporting Books
- Code Examples
- Early Release Status and Feedback
- 1. Introduction to Data Analysis with Spark
- What is Apache Spark?
- A Unified Stack
- Spark Core
- Spark SQL
- Spark Streaming
- MLlib
- GraphX
- Cluster Managers
- Who Uses Spark, and For What?
- Data Science Tasks
- Data Processing Applications
- A Brief History of Spark
- Spark Versions and Releases
- Spark and Hadoop
- 2. Downloading and Getting Started
- Downloading Spark
- Introduction to Spark’s Python and Scala Shells
- Introduction to Core Spark Concepts
- Standalone Applications
- Initializing a SparkContext
- Conclusion
- 3. Programming with RDDs
- RDD Basics
- Creating RDDs
- RDD Operations
- Transformations
- Actions
- Lazy Evaluation
- Passing Functions to Spark
- Python
- Scala
- Java
- Common Transformations and Actions
- Basic RDDs
- Transformations
- Element-wise transformations
- Pseudo Set Operations
- Actions
- Converting Between RDD Types
- Scala
- Java
- Python
- Persistence (Caching)
- Conclusion
- 4. Working with Key-Value Pairs
- Motivation
- Creating Pair RDDs
- Transformations on Pair RDDs
- Aggregations
- Tuning the Level of Parallelism
- Grouping Data
- Joins
- Sorting Data
- Actions Available on Pair RDDs
- Data Partitioning
- Determining an RDD’s Partitioner
- Operations that Benefit from Partitioning
- Operations that Affect Partitioning
- Example: PageRank
- Custom Partitioners
- Conclusion
- 5. Loading and Saving Your Data
- Motivation
- Choosing a Format
- Formats
- Text Files
- JSON
- CSV (Comma Separated Values) / TSV (Tab Separated Values)
- Sequence Files
- Object Files
- Hadoop Input and Output Formats
- Protocol Buffers
- Hive and Parquet
- File Systems
- Local/"Regular” FS
- Amazon S3
- HDFS
- Compression
- Databases
- Elasticsearch
- Mongo
- Cassandra
- HBase
- Java Database Connectivity (JDBC)
- Conclusion
- About the Authors
- Copyright
本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【Learning Spark(目录)】(https://www.iteblog.com/learning-spark-table-of-contents/)
谢谢分享