Data Analysis Module
Overview
The analysis module contains functions that summarize relationships between Yelp business characteristics and customer ratings.
These functions are designed for exploratory data analysis and are used directly by the Streamlit app to generate tables and visualizations.
Functions
Reviews vs Rating
reviews_vs_rating(df)This function summarizes how review volume relates to average customer ratings across businesses.
It groups businesses and computes: - Average rating - Average review count - Number of shops per group
Price vs Rating
price_vs_rating(df)This function examines whether price level is associated with higher or lower customer ratings.
City vs Rating
city_vs_rating(df)This function compares average ratings across cities in Utah and counts the number of shops per city.
Service Type vs Rating
service_type_vs_rating(df)This function evaluates whether service type (delivery, pickup, both, neither) is associated with differences in customer ratings.
Second Most Common Category
second_most_common_category(df)This function identifies the most common Yelp business category besides Ice Cream.
Purpose:
All analysis functions:
- Accept a cleaned pandas DataFrame
- Perform data aggregation using pandas
- Return summary tables