如何将Mapreduce程序转换成Spark程序

　　本文详细地介绍了如何将Hadoop上的Mapreduce程序转换成Spark的应用程序。有兴趣的可以参考一下：The key to getting the most out of Spark is to understand the differences between its RDD API and the original Mapper and Reducer API.Venerable MapReduce has been Apache Hadoop‘s work-horse computation paradigm since its inception. It is ideal for the kinds of work for which Hadoop was originally des

w397090770 11年前 (2014-09-07) 6450℃ 1评论9喜欢

Spark meetup(杭州)PPT资料分享

　　《Spark meetup(Beijing)资料分享》　　《Spark meetup(杭州)PPT资料分享》　　《北京第二次Spark meetup会议资料分享》　　《北京第三次Spark meetup会议资料分享》　　《北京第四次Spark meetup会议资料分享》　　《北京第五次Spark meetup会议资料分享》》　　《北京第六次Spark meetup会议资料分享》　　8月31日(13:30-17:30)，杭州第

w397090770 11年前 (2014-09-01) 26673℃ 230评论17喜欢

Spark meetup(Beijing)资料分享

　　《Spark meetup(Beijing)资料分享》　　《Spark meetup(杭州)PPT资料分享》　　《北京第二次Spark meetup会议资料分享》　　《北京第三次Spark meetup会议资料分享》　　《北京第四次Spark meetup会议资料分享》　　《北京第五次Spark meetup会议资料分享》》　　《北京第六次Spark meetup会议资料分享》　　下面是Spark meetup(Beijing)第

w397090770 11年前 (2014-08-29) 24104℃ 204评论16喜欢

Spark SQL & Spark Hive编程开发，并和Hive执行效率对比

　　Spark SQL也公布了很久，今天写了个程序来看下Spark SQL、Spark Hive以及直接用Hive执行的效率进行了对比。以上测试都是跑在YARN上。　　首先我们来看看我的环境： 3台DataNode，2台NameNode，每台机器20G内存，24核数据都是lzo格式的，共336个文件，338.6 G 无其他任务执行如果想及时了解Spark、Hadoop或者Hbase相关的文章，欢迎关

w397090770 11年前 (2014-08-13) 50072℃ 9评论51喜欢

SQL on Hadoop:场景和结论

以下文章是转载自国外网站，介绍了Hadoop生态系统上面的几种SQL：Hive、Drill、Impala、Presto以及Spark\Shark等应用场景、对比以及一些结论Within the big data landscape there are multiple approaches to accessing, analyzing, and manipulating data in Hadoop. Each depends on key considerations such as latency, ANSI SQL completeness (and the ability to tolerate machine-generated SQL), developer and a

w397090770 11年前 (2014-08-11) 9921℃ 0评论14喜欢

Spark Release 1.0.2发布

　　Spark Release 1.0.2于2014年8月5日发布，Spark 1.0.2 is a maintenance release with bug fixes. This release is based on the branch-1.0 maintenance branch of Spark. We recommend all 1.0.x users to upgrade to this stable release. Contributions to this release came from 30 developers.如果想及时了解Spark、Hadoop或者Hbase相关的文章，欢迎关注微信公共帐号：iteblog_hadoopYou can download Spark 1.0.2 as

w397090770 11年前 (2014-08-06) 5825℃ 2评论4喜欢

Spark稳定版0.9.2版本发布

　　Spark 0.9.2于昨天（2014年07月23日）发布。对，你没看错，是Spark 0.9.2。Spark 0.9.2是基于0.9的分枝，修复了一些bug，推荐所有使用0.9.x的用户升级到这个稳定版本。有28位开发者参与了这次版本的开发。虽然Spark已经发布了Spark 1.0.x，但是里面有不少的bug，这次的Spark是稳定版。如果想及时了解Spark、Hadoop或者Hbase相关的文章，欢迎关

w397090770 11年前 (2014-07-24) 4643℃ 0评论3喜欢

如何在CDH 5上运行Spark应用程序

本文转载自：http://blog.cloudera.com/blog/2014/04/how-to-run-a-simple-apache-spark-app-in-cdh-5/(Editor’s note – this post has been updated to reflect CDH 5.1/Spark 1.0)Apache Spark is a general-purpose, cluster computing framework that, like MapReduce in Apache Hadoop, offers powerful abstractions for processing large datasets. For various reasons pertaining to performance, functionality, and APIs, Spark is already be

w397090770 11年前 (2014-07-18) 20174℃ 3评论9喜欢

Spark 1.0.1发布了

　　2014年7月11日，Spark 1.0.1已经发布了，原文如下：We are happy to announce the availability of Spark 1.0.1! This release includes contributions from 70 developers. Spark 1.0.0 includes fixes across several areas of Spark, including the core API, PySpark, and MLlib. It also includes new features in Spark’s (alpha) SQL library, including support for JSON data and performance and stability fixes.Visit the relea

w397090770 11年前 (2014-07-13) 6889℃ 0评论4喜欢