WANG LH , Research & Development

lookreco

2021.05.31 22:05

Lookreco数据架构整理

一、源数据抽取

采用jdbc方式从神策抽取数据存放到hive

需要分析的事件

  1. AddCart
  2. CollectionView
  3. PayOrderDetail
  4. PayOrder
  5. ProductView
  6. Search
  7. SubmitOrderDetail
  8. SubmitOrder

数据抽取过程

所有事件过程类似,以ProductView为例

直接执行任务

spark-submit --executor-cores 4 --executor-memory 4g --master yarn --deploy-mode cluster --driver-memory 4g --conf spark.sql.autoBroadcastJoinThreshold=33554432 --conf spark.core.connection.ack.wait.timeout=600 --conf spark.sql.planner.sortMergeJoin=true --conf spark.rpc.askTimeout=200 --conf spark.rpc.message.maxSize=40 --conf spark.shuffle.io.retryWait=30 --conf spark.shuffle.io.maxRetries=10 --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.shuffle.service.port=7337 --conf spark.locality.wait=0 --jars /root/xuminghao/jars/geoip2-2.8.1.jar,/root/xuminghao/jars/maxmind-db-1.2.2.jar,/root/xuminghao/jars/jackson-databind-2.7.9.jar,/root/xuminghao/jars/jackson-annotations-2.7.0.jar,/root/xuminghao/jars/jackson-core-2.7.9.jar --class com.look.oss.business.shencetohive.ProductViewMove /root/xuminghao/jars/lookreco-1.0-SNAPSHOT.jar 20190107 20190107

二、商品的偏好统计,特征计算,生成特征vector