Spark、Hadoop或者HBase相关的文章,欢迎关注微信公众号:iteblog_hadoop

本次会议的议题范围具体如下:

  • Apache Spark™, Delta Lake, MLflow 以及 Koalas 未来规划;
  • 管理机器学习生命周期的最佳实践
  • 构建大规模可靠数据管道的技巧
  • 流行的深度学习和机器学习框架的最新发展
  • 真实的 AI 用户案例
  • 下载途径

    关注微信公众号 过往记忆大数据 或者 Java技术范 并回复 spark-9832 获取。

    可下载的PPT

    下面议题提供 PPT 下载

  • Data Science Across Data Sources with Apache Arrow
  • Portable Scalable Data Visualization Techniques for Apache Spark and Python Notebook-based Analytics
  • Native Support of Prometheus Monitoring in Apache Spark 3.0
  • Performant Streaming in Production: Preventing Common Pitfalls when Productionizing Streaming Jobs
  • Scaling Security Threat Detection with Apache Spark and Databricks
  • User Defined Aggregation in Apache Spark: A Love Story
  • Powering Interactive BI Analytics with Presto and Delta Lake
  • Using AI to Support Proliferating Merchant Changes
  • Tuning ML Models: Scaling, Workflows, and Architecture
  • Battling Model Decay with Deep Learning and Gamification
  • An Approach to Data Quality for Netflix Personalization Systems
  • High-Performance Analytics with Probabilistic Data Structures: the Power of HyperLogLog
  • Preventing Abuse Using Unsupervised Learning
  • Geospatial Analytics at Scale: Analyzing Human Movement Patterns During a Pandemic
  • Leveraging Apache Spark for Scalable Data Prep and Inference in Deep Learning
  • Filtering vs Enriching Data in Apache Spark
  • Scalable Acceleration of XGBoost Training on Apache Spark GPU Clusters
  • Deep Dive into GPU Support in Apache Spark 3.x
  • Sputnik: Airbnb’s Apache Spark Framework for Data Engineering
  • Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
  • Automated and Explainable Deep Learning for Clinical Language Understanding at Roche
  • Building Understanding Out of Incomplete and Biased Datasets using Machine Learning and Databricks
  • Encryption and Masking for Sensitive Apache Spark Analytics Addressing CCPA and Governance
  • Managing ADLS gen2 using Apache Spark
  • Using Apache Spark and Differential Privacy for Protecting the Privacy of the 2020 Census Respondents
  • The 2020 Census and Innovation in Surveys
  • scaling-data-and-ml-with-apache-spark-and-feast
  • The Apache Spark File Format Ecosystem
  • Building the Petcare Data Platform using Delta Lake and 'Kyte': Our Spark ETL Pipeline
  • A Production Quality Sketching Library for the Analysis of Big Data
  • Children Safety Retrieval (CENSER) System for Retrieval of Kidnapped Children from Brothels in India
  • Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner Enabled Apache Spark Clusters
  • Scalable AutoML for Time Series Forecasting using Ray
  • Using Machine Learning to Evolve Sports Entertainment
  • Using Bayesian Generative Models with Apache Spark to Solve Entity Resolution Problems (DeDup, Merging, Uniqueness) at Scale
  • Fine Tuning and Enhancing Performance of Apache Spark Jobs
  • All In - Migrating a Genomics Pipeline from BASH/Hive to Spark (Azure Databricks) - A Real World Case Study
  • Running Apache Spark on Kubernetes: Best Practices and Pitfalls
  • Lessons Learned from Modernizing USCIS Data Analytics Platform
  • On Improving Broadcast Joins in Apache Spark SQL
  • Using Databricks as an Analysis Platform
  • Is This Thing On? A Well State Model for the People
  • Advanced Natural Language Processing with Apache Spark NLP
  • Building a Streaming Microservice Architecture: with Apache Spark Structured Streaming and Friends
  • Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
  • Deploying Apache Spark Jobs on Kubernetes with Helm and Spark Operator
  • Resource-Efficient Deep Learning Model Selection on Apache Spark
  • Bring Satellite and Drone Imagery into your Data Science Workflows
  • Scoring at Scale: Generating Follow Recommendations for Over 690 Million LinkedIn Members
  • From HDFS to S3: Migrate Pinterest Apache Spark Clusters
  • SparkCruise: Automatic Computation Reuse in Apache Spark
  • Chromatic Sparse Learning
  • Deploy and Serve Model from Azure Databricks onto Azure Machine Learning
  • Cloud-Native Apache Spark Scheduling with YuniKorn Scheduler
  • The Revolution Will be Streamed
  • Democratizing PySpark for Mobile Game Publishing
  • Ray: Enterprise-Grade, Distributed Python
  • Fugue: Unifying Spark and Non-Spark Ecosystems for Big Data Analytics
  • Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
  • Scaling Up AI Research to Production with PyTorch and MLFlow
  • Best Practices for Building Robust Data Platform with Apache Spark and Delta
  • Building a Pipeline for State-of-the-Art Natural Language Processing Using Hugging Face Tools
  • Designing the Next Generation of Data Pipelines at Zillow with Apache Spark
  • Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
  • Flash for Apache Spark Shuffle with Cosco
  • Building a Real-Time Feature Store at iFood
  • AutoML Toolkit – Deep Dive
  • Operationalize Apache Spark Analytics
  • End-to-End Deep Learning with Horovod on Apache Spark
  • Building Data Quality Audit Framework using Delta Lake at Cerner
  • Zipline - A Declarative Feature Engineering Framework
  • Automating Federal Aviation Administration’s (FAA) System Wide Information Management (SWIM) Data Ingestion and Analysis
  • Apache Spark NLP for Healthcare: Lessons Learned Building Real-World Healthcare AI Systems
  • A Thorough Comparison of Delta Lake, Iceberg and Hudi
  • Productionizing Machine Learning Pipelines with Databricks and Azure ML
  • Advertising Fraud Detection at Scale at T-Mobile
  • AI-Assisted Feature Selection for Big Data Modeling
  • The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
  • Ibis: Seamless Transition Between Pandas and Apache Spark
  • Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
  • Power of Visualizing Embeddings
  • Deliver Dynamic Customer Journey Orchestration at Scale
  • Top Down Specialization Using Apache Spark
  • The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Production
  • Tackling Scaling Challenges of Apache Spark at LinkedIn
  • Scaling up Deep Learning by Scaling Down
  • Wood Log Inventory Estimation using Image Processing and Deep Learning Technique
  • Building Identity Graphs over Heterogeneous Data
  • Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ground to cloud using SQL Server
  • Generative Hyperloop Design: Managing Massively Scaled Simulations Focused on Quick-Insight Analytics and Demand Modelling
  • Efficiently Building Machine Learning Models for Predictive Maintenance in the Oil & Gas Industry with Databricks
  • Consolidate Your Technical Debt With Spark Data Sources -Tools and Techniques to Integrate Native Code
  • Running Emerging AI Applications on Big Data Platforms with Ray On Apache Spark
  • Best Practices for Engineering Production-Ready Software with Apache Spark
  • Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
  • Composable Data Processing with Apache Spark
  • Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based FPGA Accelerators
  • Accelerating the ML Lifecycle with an Enterprise-Grade Feature Store
  • Faster Data Integration Pipeline Execution using Spark-Jobserver
  • Continuous Delivery of ML-Enabled Pipelines on Databricks using MLflow
  • Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote Persistent Memory Pools
  • Bucketing 2.0: Improve Spark SQL Performance by Removing Shuffle
  • How to Performance-Tune Apache Spark Applications in Large Clusters
  • Saving Energy in Homes with a Unified Approach to Data and AI
  • Productionizing Deep Reinforcement Learning with Spark and MLflow
  • SQL Performance Improvements at a Glance in Apache Spark 3.0
  • Pandas UDF and Python Type Hint in Apache Spark 3.0
  • Parallelization of Structured Streaming Jobs Using Delta Lake
  • Artificial Lawyers. Will Your Next Attorney be a Machine?
  • Adaptive Query Execution: Speeding Up Spark SQL at Runtime
  • How Azure and Databricks Enabled a Personalized Experience for Customers and Patients at CVS Health
  • Optimize the Large Scale Graph Applications by using Apache Spark with 4-5x Performance Improvements
  • Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes Beyond the Data Lake
  • Running Apache Spark Jobs Using Kubernetes
  • Koalas: Making an Easy Transition from Pandas to Apache Spark
  • Vectorized Deep Learning Acceleration from Preprocessing to Inference and Training on Apache Spark in SK Telecom
  • Text Extraction from Product Images Using State-of-the-Art Deep Learning Techniques
  • Care and Feeding of Catalyst Optimizer
  • Apache Spark vs Apache Spark: An On-Prem Comparison of Databricks and Open-Source Spark
  • Enabling Physics and Empirical-Based Algorithms with Spark Using the Integration of MATLAB in Databricks
  • Democratizing Data
  • Evolution is Continuous, and so are Big Data and Streaming Pipelines
  • Geospatial Options in Apache Spark
  • Scaling Production Machine Learning Pipelines with Databricks
  • Taming the Search: A Practical Way of Enforcing GDPR and CCPA in Very Large Datasets with Apache Spark
  • Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
  • Productionizing Machine Learning with a Microservices Architecture
  • Productionalizing Models through CI/CD Design with MLflow
  • DataSource V2 and Cassandra – A Whole New World
  • Hyperspace: An Indexing Subsystem for Apache Spark
  • Data Driven Decisions at Scale
  • Deep Dive into the New Features of Apache Spark 3.0
  • Securing Apache Spark Applications at Facebook
  • Building a Feature Store around Dataframes and Apache Spark
  • Tracing the Breadcrumbs: Apache Spark Workload Diagnostics
  • Enabling Push Button Productization of AI Models
  • Everyday Probabilistic Data Structures for Humans
  • Deep Learning Enabled Price Action with Databricks and AWS
  • Clinical Suspecting at Scale Using PySpark
  • Using Apache Spark for Predicting Degrading and Failing Parts in Aviation
  • Operationalizing Big Data Pipelines At Scale
  • Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with Delta Lake
  • How Adobe Does 2 Million Records Per Second Using Apache Spark!
  • Accelerating Data Processing in Spark SQL with Pandas UDFs
  • Building a Federated Data Directory Platform for Public Health
  • Translating Models to Medicine an Example of Managing Visual Communications
  • Delta from a Data Engineer's Perspective
  • Disrupting Risk Management through Emerging Technologies
  • Automated Testing For Protecting Data Pipelines from Undocumented Assumptions
  • Geosp.AI.tial: Applying Big Data and Machine Learning to Solve the World's Toughest Geospatial Intelligence Problems
  • Delta from a Data Engineer's Perspective
  • Healthcare Claim Reimbursement using Apache Spark
  • From Idea to Model: Productionizing Data Pipelines with Apache Airflow
  • Willump: Optimizing Feature Computation in ML Inference
  • Real-Time Forecasting at Scale using Delta Lake and Delta Caching
  • Leveraging Apache Spark to Develop AI-Enabled Products and Services at Bosch
  • Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS Sagemaker for Enterprise AI Scenarios
  • From Python to PySpark and Back Again – Unifying Single-host and Distributed Deep Learning with Maggy
  • Shparkley: Scaling Shapley with Apache Spark
  • Understanding and Improving Code Generation
  • User Defined Aggregation in Apache Spark: A Love Story
  • Machine Learning Data Lineage with MLflow and Delta Lake
  • Memory Optimization and Reliable Metrics in ML Pipelines at Netflix
  • Operationalizing Machine Learning at Scale at Starbucks
  • Presto on Apache Spark: A Tale of Two Computation Engines
  • Generalized SEIR Model on Large Networks
  • Deep Learning at Scale with Apache Spark and Determined
  • How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect
  • Rapid Response to Hospital Operations using Data and AI during COVID-19
  • 本博客文章除特别声明,全部都是原创!
    原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
    本文链接: 【Spark Summit North America 202006 高清 PPT 下载】(https://www.iteblog.com/archives/9832.html)
    喜欢 (2)
    分享 (0)
    发表我的评论
    取消评论

    表情
    本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!