大数据

 找回密码
 立即注册
搜索
热搜: AWS Hadoop Tableau
查看: 29017|回复: 0
打印 上一主题 下一主题

迁移AWS EMR4.1.0 Spark 1.5.0遇到的问题

[复制链接]

15

主题

15

帖子

287

积分

版主

Rank: 7Rank: 7Rank: 7

积分
287
跳转到指定楼层
楼主
发表于 2015-11-11 09:42:59 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
最近迁移到AWS EMR 4.1.0,其中Spark的版本是1.5.0,在Spark Streaming应用部署的过程中,遇到了不少问题:

1. Cast Exception
org.apache.spark.deploy.SparkHadoopUtil cannot be cast to org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

https://github.com/apache/spark/pull/9174
https://github.com/apache/spark/pull/8911

Make sure SPARK_YARN_MODE is set before spark-submit
export SPARK_YARN_MODE=true

2. guava NoSuchMethodError Exception
java.lang.NoSuchMethodError: com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque

https://community.cloudera.com/t ... hodError/td-p/21159
- Add bootstrap action to copy guava-16.0.1.jar to each data node
- Set guava-16.0.1.jar as the first postition in both --driver-class-path and spark.executor.extraClassPath

3. Spark Not Using All Available Cores
YARN doesn't actually know how many cores are being used by the containers, so it's just displaying 1 vcore used per executor.

https://forums.aws.amazon.com/th ... dID=218950&tstart=0
Set yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in capacity-scheduler.xml
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

快速回复 返回顶部 返回列表