E- Commerce Analytics
Objective: Use Spark features for data analysis to derive the valuable insights.
You are working as a Big Data consultant for an E-commerce company. Your role is to analyze sales data. The company has multiple stores across the globe. They want you to do the analytics of their sales transaction data. You need to provide valuable insights to understand their sales across cities, state on a daily and weekly basis. Also, provide various other insights regarding the review of the products.
Analysis to be done: Exploratory analysis, to determine actionable insights.
Dataset File: olist_public_dataset.csv
Insights on Historical Data
- Daily Insights
- SALESTotal sales.Total Sales in each Customer City.Total sales in each Customer State.
- ORDERSTotal number of orders sold.City wise order distribution.State wise order distribution.Average Review score per Order.Average Freight charges per order.Average time taken to approve the orders. (Order Approved – Order Purchased).Average order delivery time.
- Weekly Insights
- Total sales.
- Total Sales in each Customer City.
- Total sales in each Customer State.
- Total number of orders sold.
- City wise order distribution.
- State wise order distribution.
- Average Review score per Order.
- Average Freight charges per order.
- Average time taken to approve the orders. (Order Approved – Order Purchased).
- Average order delivery time.
- Total Freight charges.
- Freight charges distribution in each Customer City.
Tasks to perform:
Week 1: Approach Overview and Basic Configurations
- Install maven (3.6.2).
- Set environment variable of Maven
a) Check if maven is setup properly using “mvn -version”
- Install Java 1.8 and Scala 2.11.7
- Use Intellij to validate or modify source code
- Click “mvn clean install” to build jar file
- Use README.md for details instructions and helper commands
Week 2: Data Ingestion
- Upload the entire data into Hive from CSV
- Copy the data from Hive into HDFS
- Check the data in HDFS path
Week 3 : Data Streaming
- Create sample Maven Scala Project
- Add necessary spark dependencies
- Create Schema of CSV files
- Create Spark Session
a) Add S3 details
b) Add all variables to your environment as they have sensitive data
- Read CSV file and convert into dataset
- Create Map of City and Country
- Convert Date to Hour, Month, Year, Daily, and Day Bucket using UDF
- Iterate through all metrics for each column
- For each type of segment, calculate stats of different cities. Stats include max, min, average, and total records
Week 4 : Data Analysis and Visualization
- Write the results into the HDFS
- Save final dataset into Amazon S3
- Create Amazon Document DB Cluster
- Save insights in Document DB and provide APIs to view aggregate data
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS