1
Introduction
Overview
We delve into the intricate web of Olist’s e-commerce sales figures and customer behaviors to extract actionable insights and enhance overall business performance. Olist, like Amazon, is an e-commerce platform that operates in Brazil. It connects small businesses with customers through its online marketplace.
This was the final project submission in my Data Visualization for Business Intelligence class and a team project by a group of 4.
Project Timeline
Nov - Dec 2023
The Problem
-
You are preparing for an interview for a Data Analyst position.
-
You are asked to select a project of your choice and make presentations – showing your skills in Data Cleaning/Preparation, Data Analysis/Visualization, and Storytelling.
-
Identify datasets (with multiple files/multiple sources) that are big, rich, and contain both time and geographic information (in addition to other interesting fields).
The Goal
Analyzing the Business Performance of an E-commerce Platform in Brazil.
-
Analyzing various business performance metrics.
-
Getting insights and recommendations to improve Business Performance.
-
Improve Product success and gain competitive advantage.
Why this Topic?
A few reasons for us to choose this topic were:
-
Enables work on business-related analytics & performance.
-
Rich dataset, also containing Time & Geographic information.
-
Testing needs to combine and clean data.
-
Potential to identify & showcase several permutations/linkages due to a large number of data fields.
-
Size of dataset
2
Dataset Description
Data Source
Datasets & Fields
1. Customer_Dataset : Describes the customers identified by their unique customer ID and their location - city, state & zip code.
2. Geolocation_Dataset : Describes the locations (longitude, longitude) along with city, state & zip code
3. Products_Dataset : Describes the products by their product ID, category name, dimensions and no. of photos.
4. Order_Items : Describes the ordered items by their order ID, order item ID, product & seller ID along with along with shipping date, price and freight value.
5. Order_Payments : Describes the order payments by their order ID, the type of payment, payment value and the number of installments to complete the payment.
6. Order_Reviews : Describes the order reviews by their order ID, review ID along with the review score, title, message and creation date.
7. Order_Dataset : Describes the orders by their order ID, customer ID, order status and time of order purchase, order approval, estimated delivery time and actual time of delivery.
8. Product_Category_Names_Translation_Dataset : Provides translation of the product category titles from Portuguese to English.
9. Sellers_Dataset : Describes the seller by their seller ID & location - city, state & zip code.
3
A Plan for Analysis
Overview
Based on the available data at our disposal, we split the analysis into five (5) modules. This would allow us to analyze performance and suggest improvements to the business by focusing on different areas separately.
1
Performance Overview
2
Customer Behavior
3
Product Performance
4
Inventory Management
5
Seller Performance
Possible Scenarios to Explore
Product Performance:
● Which product category could the company invest in more?
● Most ordered items? Which products are the best sellers?
● Are there any products with declining sales?
● Any correlation between product ratings and sales?
Customer Behavior:
● Are there any trends in customer behavior based on the time of day/week?
● Any preferences based on Price?
● Are there any effects of Discounts & Promotions?
● What is the preferred Mode of Payment for a product?
Inventory Management:
● What could demand in certain regions mean to inventory levels?
● How can the supply-chain be optimized?
Effect of Customer Location on Orders:
● Which zip codes have the highest orders?
● What is the distribution of customer demographics (age, gender, location)?
Seller Performance:
● Which are the best performing sellers?
● How may sellers make use of this data?
4
Data Cleaning
By employing Tableau Prep Builder for these data cleaning steps, we ensured a streamlined and systematic approach, setting the foundation for a robust and well-prepared dataset, ready for in-depth exploration and analysis in the subsequent stages of the project.
Data Cleaning Steps
01
Converting Data Types/Roles
We systematically converted data types and roles to ensure consistency and accuracy in the dataset. By aligning data types with their appropriate roles, we could enhance the dataset's integrity, making it more conducive to subsequent analyses.
02
Grouping Values & Filtering
Grouped similar values and applied filters strategically. This not only organized the data but also facilitated a clearer representation of patterns and trends during the exploratory data analysis. Grouping and filtering were pivotal for isolating specific subsets relevant to the project's objectives.
03
Handling Null Values
Addressing null values is crucial for maintaining data completeness and reliability. We systematically handled null values by hiding or removal in cases where it was irrelevant, ensuring a robust and consistent dataset for subsequent analysis.
04
Merging & Renaming Columns
To enhance the dataset's comprehensibility, we merged and renamed some columns. This step aimed to create a more coherent and standardized structure, simplifying the subsequent stages of data exploration and analysis.
05
Creating Calculated Fields
The creation of calculated fields allowed for the derivation of new variables based on existing ones. This step played a vital role in introducing additional dimensions or measures that proved instrumental in uncovering deeper insights during the data analysis phase.
06
Discarding Obsolete Data
Cleaning involved identifying and discarding obsolete or redundant data points. Through Tableau Prep Builder, we systematically removed irrelevant information, ensuring the dataset remained focused on pertinent variables, contributing to the project's overall coherence and efficiency.
07
Data Joins
Tableau Prep Builder facilitated seamless data joins, enabling the integration of multiple datasets. This step was instrumental in incorporating diverse data sources, enriching the dataset and providing a holistic perspective for subsequent analyses.
08
Aggregation of Data
Aggregating data involved summarizing and condensing information to a more manageable form. This step was crucial for simplifying complex datasets, enabling clearer insights and facilitating efficient analysis during the exploratory phase.
Data Flow
Challenges
● Understanding Brazilian demographics beyond the Dataset.
● Language Translation changes.
● Duplicate Values for Cities.
● Ample of Grouping tasks were required.
● Creation of Calculated fields to make sense of unused data.
5
Preliminary Insights
Overview
Product Performance - Fashion & Clothing have the highest revenues, while Furniture & Appliances were most sold items.
Customer Behavior - Seasonality observed in number of sales across the year.
Performance by State - States with maximum number of Orders are the ones that are producing the least average revenue per order.
Inventory Management - A direct relationship between Volume of goods and the Revenue generated.
Seller Performance - A comparison between Revenues across each state can help sellers identify areas of opportunity.
6
Data Visualization
Choice of Tools
For the visual representation of the ecommerce data, we opted for Tableau. The platform's versatility allowed us to create dynamic and interactive visualizations, aligning with the project's requirements for in-depth analysis and clear communication of insights.
The project involved a use of Story points in Tableau with a restriction of 20 points. We split this as per our five (5) modules of study.
Visualization Goals
The visualization goals centered around uncovering trends, patterns, and performance metrics critical to the ecommerce business. We aimed to provide stakeholders with a visual narrative that would guide strategic decision-making and improve overall business outcomes.
Data Storytelling
Through carefully crafted visualizations, I wove a compelling data story that highlighted the customer preferences, product performance, and other key business metrics. The visual narrative aimed to make complex data accessible, facilitating a deeper understanding of the ecommerce ecosystem.
Types of Visualizations
Utilizing a variety of visualization types, including line, bubble, bar charts & scatter plots for understanding product performance and customer preferences and heatmaps, tree maps & geographical maps for inventory management & seller performance. This ensured a comprehensive representation of diverse aspects of the ecommerce business.
To further improve visualization as per the storyline, we have also made use of several Calculated Fields, Hierarchies, Parameters & Groups in Tableau to make better sense of the data and portray meaningful visuals.
Interactivity & Engagement
To enhance user engagement, interactive elements were incorporated into the visualizations. This allows stakeholders to drill down into specific metrics, explore trends over time, and gain actionable insights directly from the visual representations.
This includes: Control of various Parameters; data on Tooltips; data references at the start of the visualization.
Insights Drawn
The insights from every visual were communicated in the same slide allowing a user to understand it by themselves. This was done by adding notes & highlighting important focus points throughout the visualization.
This involved identifying top-performing products, understanding user preferences & patterns, and pinpointing areas for potential improvement. These insights serve as a foundation for strategic decision-making.
Design Choices
The design choices made in terms of color schemes, layout, and overall aesthetics were geared towards clarity and visual appeal.
We opted for a Blue-Orange color scheme to accommodate for color-blindness.
By maintaining a cohesive and visually appealing design also involving Gestalt Principles, the visualizations were effective in conveying information without sacrificing aesthetic appeal.
Outcomes
7
Overview of Revenues
We analyzed numbers by the 5 regions of Brazil i.e North, Northeast, Central West, South, Southeast.
The total revenue generated was 208M over the time period (Q4 2016 to Q3 2018).
While the Southeast region made the highest revenue of 13M, Sao Paulo (SP) stood out as the state with the highest share of revenue by a large amount (37%) among all states in the country.
The city of Sao Paulo itself had 2.8M in sales.
Revenue per quarter was expected to touch 7M through forecasting, from the last evident 4.8M in Q3 2018.
Seasonality
Seasonality was observed by month with August serving highest sales figures & September with lowest.
Predicting High and Low demand can be used to forecast sales and stock up inventories.
Sellers can increase warehouse capacities accordingly and hire temporary staff to handle large order volumes.
Preferred Payment Methods
Credit Cards were by far the most popular mode of payment.
The Brazilian mode of cash payment i.e Boleto was also popular among customers, while Debit Cards were scarcely preferred.
Popular Product Segments
From the analysis of the performance of 11 Product Segments, the Home, Garden & Tools segment was the largest source of revenue on the e-commerce platform, followed by Clothing & Accessories.
The split between preferred mode of payment by consumers showed that at an average of 75% of payments per segment were done by Credit Cards. Suppliers can use this data for promotional items and tie-ups with banks for financing options.
Revenue by Segment
Home, Garden & Tools segment consistently made over 30% of the Revenue over all quarters of the studies time period.
An opportunity for improvement was promoting & increasing market share on Food & Groceries which only made approx. 0.5 to 1% of the Revenue per quarter on the platform.
Effect of Delays on Low Ratings
Ratings with value 1 & 2 (out of 5) were directly related to delays in shipping. This implies a requirement for a robust shipping network, supply chain & efficient inventory management.
An analysis of no. of orders by month showed that one way to reduce delays could be to stock up more of these product segments before the start of the month.
8
Conclusion
In conclusion, this business analysis of the ecommerce website not only unearthed valuable insights but also showcased the power of data visualization in making complex business data accessible and actionable. The project contributes to my portfolio as a testament to my proficiency in data analytics and visualization, coupled with a keen understanding of ecommerce dynamics.
9
Future Steps
Predictive Analytics
Explore the integration of predictive analytics models to forecast future trends, customer behavior, and potential market shifts. This proactive approach can empower the ecommerce platform to anticipate and adapt to changing dynamics.
Supply Chain Optimization
The analysis can be extended to include supply chain and inventory data. A deeper study can be done to improve inventory management & seller performance.
Further, one can identify opportunities for optimizing logistics, reducing costs, and improving overall efficiency in the product delivery process.
Personalization Strategies
Develop and implement personalized user experiences based on the insights gained. Utilize data-driven recommendations and tailored content to enhance customer engagement, satisfaction, and conversion rates.