Tutorial: Analyzing Ice Cream Shops in Utah Using Yelp Data

1. Introduction

This tutorial demonstrates how to use the Ice Cream Shop Analysis Python package to collect, clean, analyze, and visualize Yelp business data.

The project focuses on ice cream shops in Utah County and Salt Lake County and investigates how business characteristics such as:

  • Review volume
  • Delivery availability
  • Price level
  • City location

relate to customer ratings on Yelp.


2. Installation

2.1 Requirements

To use this package and run the Streamlit app, you will need:

  • Python 3.11 or higher
  • Git
  • A virtual environment (recommended)

The project uses Python packaging with pyproject.toml, and a simple requirements.txt file is provided for running the Streamlit app.

2.2 Instal from Github

Once finalized, the package will be installable directly from GitHub:

pip install git+https://github.com/yourusername/yelp_final_project.git

Alternatively, you can clone the repository and install locally:

git clone https://github.com/yourusername/yelp_final_project.git
cd yelp_final_project
pip install .

To install dependencies for running the Streamlit app only:

pip install -r requirements.txt

3. Project Structure

The repository follows a standard src-layout Python project structure:

yelp_final_project/
├── pyproject.toml
├── requirements.txt
├── README.md
├── src/
│   └── yelp_final_project/
│       ├── cleaning.py
│       ├── analysis.py
│       ├── streamlit_app.py
│       └── __init__.py
├── tests/
├── docs/
└── Tutorial.qmd

Key components:

  • cleaning.py: data loading and cleaning functions
  • analysis.py: summary statistics and analysis functions
  • streamlit_app.py: interactive Streamlit dashboard
  • docs/: Quarto documentation and tutorial

4. Data Cleaning

The cleaning pipeline standardizes raw Yelp data into a format suitable for analysis.

Key cleaning steps include:

  • Standardizing column names
  • Converting dollar-sign price fields (\(–\)$$$) into numeric price levels
  • Creating a unified city column
  • Classifying businesses by service type
  • Removing unused or irrelevant columns

Example: Cleaning the data

from yelp_final_project.cleaning import clean_data

df_clean = clean_data()
df_clean.head()

This function returns a cleaned pandas DataFrame that is used throughout the analysis and Streamlit app.


5. Analysis Functions

The package includes several analysis helpers that summarize relationships between business characteristics and ratings.

5.1 Reviews vs Rating

from yelp_final_project.analysis import reviews_vs_rating

reviews_summary = reviews_vs_rating(df_clean)
reviews_summary

This function groups businesses and summarizes how review volume relates to average customer ratings.

5.2 Price Level vs Rating

from yelp_final_project.analysis import price_vs_rating

price_summary = price_vs_rating(df_clean)
price_summary

This analysis explores whether higher-priced ice cream shops tend to receive higher ratings.

5.3 City vs Rating

from yelp_final_project.analysis import city_vs_rating

city_summary = city_vs_rating(df_clean)
city_summary

This function compares average ratings across cities in Utah.

5.4 Service Type vs Rating

from yelp_final_project.analysis import service_type_vs_rating

service_summary = service_type_vs_rating(df_clean)
service_summary

This analysis examines how service options (e.g., takeout, dine-in) relate to customer ratings.

5.5 Second Most Common Category

from yelp_final_project.analysis import second_most_common_category

second_cat = second_most_common_category(df_clean)
second_cat

This helper identifies the most common business category besides ice cream and frozen yogurt.


6. Streamlit App

An interactive dashboard is included to make the analysis accessible to all users.

6.1 Running the App

From the project root:

streamlit run src/yelp_final_project/streamlit_app.py

The app allows users to:

  • Preview raw and cleaned data
  • Apply filters by city, price level, and service type
  • Explore interactive charts and tables
  • View a geographic map of ice cream shops across Utah

7. Documentation and GitHub Pages

All documentation for this project—including this tutorial—is built using Quarto and hosted on GitHub Pages.

The docs/ folder contains:

  • Function and module documentation
  • This tutorial
  • A written project report

Links to the published documentation and Streamlit app are provided in the project README.


8. Conclusion

This package includes each step of a data science workflow to gather, clean, analyze, and visualize data. The main parts that make up our package are:

  • Data cleaning and preprocessing
  • Exploratory data analysis
  • Interactive visualization
  • Reproducible documentation

By combining Python packaging, testing, Streamlit, and Quarto, this project will guide readers through how to gather, analyze and present their own data.