Lookreco数据架构整理
一、源数据抽取
采用jdbc方式从神策抽取数据存放到hive
需要分析的事件
- AddCart
- CollectionView
- PayOrderDetail
- PayOrder
- ProductView
- Search
- SubmitOrderDetail
- SubmitOrder
数据抽取过程
所有事件过程类似,以ProductView为例
直接执行任务
spark-submit --executor-cores 4 --executor-memory 4g --master yarn --deploy-mode cluster --driver-memory 4g --conf spark.sql.autoBroadcastJoinThreshold=33554432 --conf spark.core.connection.ack.wait.timeout=600 --conf spark.sql.planner.sortMergeJoin=true --conf spark.rpc.askTimeout=200 --conf spark.rpc.message.maxSize=40 --conf spark.shuffle.io.retryWait=30 --conf spark.shuffle.io.maxRetries=10 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.shuffle.service.port=7337 --conf spark.locality.wait=0 --jars /root/xuminghao/jars/geoip2-2.8.1.jar,/root/xuminghao/jars/maxmind-db-1.2.2.jar,/root/xuminghao/jars/jackson-databind-2.7.9.jar,/root/xuminghao/jars/jackson-annotations-2.7.0.jar,/root/xuminghao/jars/jackson-core-2.7.9.jar --class com.look.oss.business.shencetohive.ProductViewMove /root/xuminghao/jars/lookreco-1.0-SNAPSHOT.jar 20190107 20190107