阅读 123

Adventure Works案例分析

本文是对Adventure Works案例的一个总结,记录了整个项目需求分析与实现的过程,主要任务是使用python完成ETL过程,并且连接到PowerBI实现可视化。中间加入了个人的一些思考,最终将整个分析成果展示出来。

分析成果的链接:销售报表




本文目录:


  • 项目背景

  • 介绍需求分析

  • 实现报表制作

一、项目背景介绍:

Adventure Works Cycles是Adventure Works样本数据库所虚构的公司,该公司生产和销售自行车到全国各地。

1、客户类型

这家公司的客户主要有两种:

  • 个体:这些客户购买商品是通过网上零售店铺;

  • 商店:这些是从Adventure Works Cycles销售代表处购买转售产品的零售店或批发店。

2、产品介绍

这家公司主要有下面四个产品线:

  • Adventure Works Cycles生产的自行车;

  • 自行车部件,例如车轮,踏板或制动组件;

  • 从供应商处购买的自行车服装,用于转售给Adventure Works Cycles的客户;

  • 从供应商处购买的自行车配件,用于转售给Adventure Works Cycles的客户。

项目数据来源:数据来源于adventure Works Cycles公司的的样本数据库。

二、需求分析与实现

项目目标

通过现有数据监控商品的销售情况,并且获取最新的商品销售趋势,以及区域分布情况,为公司的制造和销售提供指导性建议,以增加公司的收益。

实操

登录mysql观察adventure_ods到有三个数据表,ods_sales_orders,ods_customer,dim_date_df
将三个表简单梳理成一份数据字典方便分析:


ods_sales_orders

ods_customer


dim_date_df

图片.png

利用python进行加工

首先,我们利用python从数据库导出ods_sales_orders并生成sum_amount_order(销量订单聚合表),求总销售金额/客单价

#导出ods_sales_ordersods_sales_orders = pd.read_sql_query("select * from ods_sales_orders ", con=adventure_ods)#生成sum_amount_order(销量订单聚合表)sum_amount_customer = ods_sales_orders.groupby('create_date').agg({'customer_key':'nunique','unit_price':sum}).reset_index().rename(columns={'unit_price': 'sum_amount', 'customer_key': 'sum_order'})#求出客单价sum_amount_customer['amount_div_order'] = sum_amount_customer['sum_amount'] / sum_amount_customer['sum_order']sum_amount_order = sum_amount_customer
sum_amount_order.head()

在这里自己创造一个随机数,生成销量聚合目标表

"""step2:利用空列表及循环生成对应随机值,与销量订单聚合表合并形成sum_amount_order_goal(销量订单聚合目标表)"""sum_amount_goal_list = []sum_order_goal_list = []# 转为list类型,遍历每个日期create_date_list = list(sum_amount_order['create_date'])for i in create_date_list:
    a = random.uniform(0.85, 1.1)
    b = random.uniform(0.85, 1.1)
    amount_goal = list(sum_amount_order[sum_amount_order['create_date'] == i]['sum_amount'])[0] * a
    order_goal = list(sum_amount_order[sum_amount_order['create_date'] == i]['sum_amount'])[0] * b
    sum_amount_goal_list.append(amount_goal)
    sum_order_goal_list.append(order_goal)# 合并sum_amount_order表与刚生成的目标值列,形成sum_amount_order_goal表sum_amount_order_goal = pd.concat([sum_amount_order, pd.DataFrame(
        {'sum_amount_goal': sum_amount_goal_list, 'sum_order_goal':
            sum_order_goal_list})], axis=1)sum_amount_order_goal.head()

求出每日订单,销量环比

#amount_diff,order_diff
sum_amount_order_goal['amount_diff']=sum_amount_order_goal['sum_amount'].pct_change().fillna(0)
sum_amount_order_goal['order_diff']=sum_amount_order_goal['sum_order'].pct_change().fillna(0)

订单环比表


然后将其于日期维度表合并就能得到每日环比表

#读取dim_date_df日期维度表dim_date_df = """ 
        select create_date,
                is_current_year,
                is_last_year,
                is_yesterday,
                is_today,
                is_current_month,
                is_current_quarter
                from dim_date_df"""dim_date_df = pd.read_sql_query(dim_date_df, con=adventure_ods)"""进行数据的融合,生成dw_order_by_day表"""# 转化create_date格式为标准日期格式sum_amount_order_goal['create_date'] = sum_amount_order_goal['create_date'].apply(lambda x: x.strftime('%Y-%m-%d'))# 通过主键create_date连接日期维度dw_order_by_day = pd.merge(sum_amount_order_goal, dim_date_df, on='create_date', how='inner')dw_order_by_day.head()

然后将每日环比表存入数据仓库方便调用

try:
    pd.read_sql_query("Truncate table dw_order_by_day", con=adventure_dw)except Exception as e:
    print("删除旧的dw_order_by_day表,error:{}")dw_order_by_day.to_sql('dw_order_by_day_{}'.format(STUDENT_NAME), con=adventure_dw,
                       if_exists='replace', index=False)
根据每日环比表得到同比表

计算出每天,昨日,本月,本季度,本年的交易总金额,去年同时期的时间维度及去年同时期的交易总金额

"""当天"""# 当天的总金额today_amount = dw_order_by_day[dw_order_by_day['is_today'] == 1]['sum_amount'].sum()# 去年同期的日期维度before_year_today = list(dw_order_by_day[dw_order_by_day['is_today'] == 1]
                         ['create_date'] + datetime.timedelta(days=-365))# 去年同期总金额before_year_today_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(before_year_today)]['sum_amount'].sum()"""昨天"""# 昨天的总金额yesterday_amount = dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['sum_amount'].sum()# 去年同期的日期维度before_year_yesterday = list(dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]
                             ['create_date'] + datetime.timedelta(days=-365))# 去年同期总金额before_year_yesterday_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_yesterday)]['sum_amount'].sum()"""当前月份"""# 当月的总金额month_amount = dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['sum_amount'].sum()# 去年同期的日期维度before_year_month = list(dw_order_by_day[dw_order_by_day['is_current_month'] == 1]
                         ['create_date'] + datetime.timedelta(days=-365))# 去年同期总金额before_year_month_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_month)]['sum_amount'].sum()"""当前季度"""# 当前季度的总金额quarter_amount = dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['sum_amount'].sum()# 去年同期的日期维度before_year_quarter = list(dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]
                           ['create_date'] + datetime.timedelta(days=-365))# 去年同期总金额before_year_quarter_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_quarter)]['sum_amount'].sum()"""当前年份"""# 当年的总金额year_amount = dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['sum_amount'].sum()# 去年同期的日期维度before_year_year = list(dw_order_by_day[dw_order_by_day['is_current_year'] == 1]
                        ['create_date'] + datetime.timedelta(days=-365))# 去年同期总金额before_year_year_amount = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_year)]['sum_amount'].sum()

计算出每天,昨日,本月,本季度,本年的交易总订单量,去年同时期的交易总订单量

"""当天"""# 当天的总订单数today_order = dw_order_by_day[dw_order_by_day['is_today'] == 1]['sum_order'].sum()# 去年同期总订单数before_year_today_order = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_today)]['sum_order'].sum()"""昨天"""# 昨天的总金额yesterday_order = dw_order_by_day[dw_order_by_day['is_yesterday'] == 1]['sum_order'].sum()# 去年同期总金额before_year_yesterday_order = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_yesterday)]['sum_order'].sum()"""当前月份"""# 当月的总金额month_order = dw_order_by_day[dw_order_by_day['is_current_month'] == 1]['sum_order'].sum()# 去年同期总金额before_year_month_order = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_month)]['sum_order'].sum()"""当前季度"""# 当前季度的总金额quarter_order = dw_order_by_day[dw_order_by_day['is_current_quarter'] == 1]['sum_order'].sum()# 去年同期总金额before_year_quarter_order = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_quarter)]['sum_order'].sum()"""当前年份"""# 当年的总金额year_order = dw_order_by_day[dw_order_by_day['is_current_year'] == 1]['sum_order'].sum()# 去年同期总金额before_year_year_order = dw_order_by_day[dw_order_by_day['create_date'].isin(
    before_year_year)]['sum_order'].sum()

然后求出当日,昨日,本月,本季度,本年的同比

amount_dict = {'today_diff': [today_amount / before_year_today_amount - 1,
                              today_order / before_year_today_order - 1,
                              (today_amount / today_order) / (before_year_today_amount /
                                                              before_year_today_order) - 1],
               'yesterday_diff': [yesterday_amount / before_year_yesterday_amount - 1,
                                  yesterday_order / before_year_yesterday_order - 1,
                                  (yesterday_amount / yesterday_order) / (before_year_yesterday_amount /
                                                                          before_year_yesterday_order) - 1],
               'month_diff': [month_amount / before_year_month_amount - 1,
                              month_order / before_year_month_order - 1,
                              (month_amount / month_order) / (before_year_month_amount /
                                                              before_year_month_order) - 1],
               'quarter_diff': [quarter_amount / before_year_quarter_amount - 1,
                                quarter_order / before_year_quarter_order - 1,
                                (quarter_amount / quarter_order) / (before_year_quarter_amount /
                                                                    before_year_quarter_order) - 1],
               'year_diff': [year_amount / before_year_year_amount - 1,
                             year_order / before_year_year_order - 1,
                             (year_amount / year_order) / (before_year_year_amount /
                                                           before_year_year_order) - 1],
               'flag': ['amount', 'order', 'avg']}

dw_amount_diff = pd.DataFrame(amount_dict)
dw_amount_diff


然后将同比表导入到数据仓库中待用

dw_amount_diff.to_sql('dw_amount_diff', con=engine,
                       if_exists='replace', index=False)
形成按照 订单日期/产品名/产品子类/产品类别/所在区域/所在省份/所在城市的逐级聚合表

利用python从数据库中导入ods_sales_orders,dw_order_by_day,dim_date_df,合并表格并进行聚合

# 通过客户id连接表sales_customer_order = pd.merge(ods_sales_orders, ods_customer, left_on='customer_key',
                                right_on='customer_key', how='left')# 提取订单主键/订单日期/客户编号/产品名/产品子类/产品类别/产品单价/所在区域/所在省份/所在城市sales_customer_order = sales_customer_order[["sales_order_key", "create_date", "customer_key",
                                            "english_product_name", "cpzl_zw", "cplb_zw", "unit_price",
                                             "chinese_territory",
                                             "chinese_province",
                                             "chinese_city"]]# 形成按照订单日期/产品名/产品子类/产品类别/所在区域/所在省份/所在城市的逐级聚合表,获得订单总量/客户总量/销售总金额sum_customer_order = sales_customer_order.groupby(["create_date", "english_product_name", "cpzl_zw", "cplb_zw",
                                                   "chinese_territory", "chinese_province",
                                                   "chinese_city"], as_index=False).\
    agg({'sales_order_key': pd.Series.nunique, 'customer_key': pd.Series.nunique,
         "unit_price": "sum"}).rename(columns={'sales_order_key': 'order_num',
                                               'customer_key': 'customer_num', 'unit_price': 'sum_amount',
                                               "english_product_name": "product_name"})# 转化订单日期为字符型格式sum_customer_order['create_date'] = sum_customer_order['create_date'].apply(lambda x: x.strftime('%Y-%m-%d'))# 获取当日日期维度dw_customer_order = pd.merge(sum_customer_order, dim_date_df, on='create_date', how='inner')dw_customer_order["create_date"] = pd.to_datetime(dw_customer_order["create_date"],format='%Y-%m-%d')dw_customer_order.head()

然后将其导出到数据仓库中待用

制作报表

  • 可视化工具:这里用到的可视化工具有折线图、柱形图、折线-柱形组合图、仪表、卡片、柱状图、切片器、地图等。可以根据需要选择图例、轴、列,以及设置数据处理方式,求和、平均值、最大值、最小值等。

  • 筛选器:有三种筛选器:视觉对象、此页、所有页面。这里用于日期、区域等字段的筛选。

  • 书签窗格:这里将按钮和书签结合使用,用于制作导航栏和动态图表。

主页展示内容:

a、基本销售指标,包括销售额、订单量、客单价、销售额kpi、销售同比去年等从时间维度分析年度、季度、月度、周、日销售情况


today

b、时间趋势图展示内容:
近一月的销售,销量情况


c、区域
按时间显示不同区域的销售金额和销售量


1人点赞

日记本



作者:光头强
链接:https://www.jianshu.com/p/aea881dc11c5


文章分类
后端
版权声明:本站是系统测试站点,无实际运营。本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 XXXXXXo@163.com 举报,一经查实,本站将立刻删除。
相关推荐