
Overview
This project served as a semester-long group project in my Database Systems course (CS 3743).
Within this course, I experienced a deep dive into the inner workings of establishing and maintaining an organization using a computerized system—and in the process, learning of how such systems were implemented in businesses from the 20th century onwards. From defining business rules to making decisions on what types of systems to implement in an organization, taking this course went beyond just learning how to create SQL expressions. This was made apparent in our group project, where we defined a SQL schema and procedural code to implement an operational database for a fictional company of our choice—we ultimately decided on a shipping company.
Group 10, Inc., as our fictional shipping company, was defined as an organization whose primary objective is to “take order requests from customers, and ensure that they are reliably handled and transported to a destination”. To fulfill this goal, a SQL schema was devised which would store and process data to help an order get from source to destination—traveling via our delivery network, on various shipments and various vessels, along the way. This schema was then implemented using several tables on an instance of the MySQL DBMS, with the following diagram demonstrating the relationships between the entities that were defined within our organization.

In addition to the database’s design, several queries were devised which would make use of triggers and procedural code to implement business rules and perform routine actions; for example, the following query, written by fellow group member Hunter Hoch, uses a subquery to gather a limited amount of information on former employees, in order to comply with data retention policies that real-world organizations may have to deal with.
CREATE TEMPORARY TABLE IF NOT EXISTS FORMEREMPLOYEES AS
SELECT emp_ID, emp_lname, emp_fname, emp_initial, emp_areacode, emp_phone, emp_email,
emp_SSN, emp_dob, emp_date_hired, emp_date_terminated, job_id, address_id
FROM (
SELECT emp_ID, emp_lname, emp_fname, emp_initial, NULL AS emp_areacode, NULL AS emp_phone,
emp_email, NULL AS emp_SSN, NULL AS emp_dob, emp_date_hired, emp_date_terminated,
job_id, NULL AS address_id
FROM employee
WHERE emp_date_terminated IS NOT NULL
) AS subquery;
Outside of my involvement during the database and query design stages, my own personal contributions to this project largely consisted of compiling data from external sources—such as the U.S. Postal Service’s Facility File, and the U.S. Department of Transportation’s National Address Database—to put together a database with example data. This process consisted of various techniques, ones as simple as filtering by certain fields in Excel (such as in the case of the Facility File), or techniques as involved as filtering a selection of addresses from a multi-gigabyte text file. In addition to retrieving external data, my work also consisted of cleaning up incomplete SQL queries created by my teammates, as well as actually loading in this data to the database server (which ranged in complexity from simple INSERT statements to MySQL LOAD DATA command-line statements). Regardless of these hurdles, however, we were ultimately able to put together a functional database, containing queries that are representative of real-life implementations of SQL databases.
Aside from the technical challenges in implementing an operational database, we were also faced with challenges relating to our project’s scope and team organization. For example, while much of our database makes use of postal address data, it is important to note that a much more robust addressing solution would be required in a real-world environment—regional variations and other complexities would overload the simple “lines 1 and 2” structure found in our implementation. However, we understood that, for the purposes of our project, developing a robust address schema would have required an exorbitant amount of time, and we decided that it fell outside of the scope of our project. In addition, while our team made a fantastic final product, we faced some snags in our ability to consistently complete our work throughout the semester; following the completion of our final database design, our data ingestion stage progressed slowly, as we did not use milestones to pace our work throughout the semester. This ultimately led to a “rush to the finish” in writing our procedural SQL and triggers, severely cutting down on the amount of time that we had to write queries representing the utility of our database. However, regardless of our challenges in development, our final database successfully fulfilled its purpose of serving as a solid foundation for a shipping business—an impressive feat for a four-student class project.
Methods used
Skills
- Proficiency in writing SQL statements to perform data insertion, modification, and querying.
- Understanding of the MySQL Workbench software, used to manage a local MySQL database server.
- Ability to use a command-line interface to log onto a local database server and perform CLI data insertion/query expressions.
- Ability to perform data extraction to gather meaningful data out of poorly structured/incomplete CSV data.
- Since the National Address Database gathers information from various independent local and state agencies, large chunks of data lack important fields (e.g., ZIP code, state) or use placeholders (e.g., “NOT STATED”); lines with incomplete data were programmatically removed from the final sample of addresses that were inserted into the database.
- Some data extraction was also performed with well-structured data, such as in the case of the USPS Facility File—this work was performed with Microsoft Excel.
- Proficiency in using regular expressions (RegEx) to filter text data.
- Regular expressions were used to modify the INSERT statements used to add data within our database (e.g., FREIGHTCOMPANY, VESSEL) within Visual Studio Code.
- Proficiency in Python programming to read from and write to CSV data files.
- Some standalone Python code was used in this program to read/write CSV files; while a Jupyter Notebook project was used to more easily filter NAD data.
- Presentation design and communication skills used in the project’s final presentation.
- A Google Slides presentation was used to present the final database design, the purpose of each database table, and the purpose of each procedural SQL query/trigger.
Software, tools, and libraries
Project files
Github repository Final SQL export Project presentation (.pdf)Media credits
Article thumbnail created by me. Diagram and title slide are extracted from the project presentation, which was created collaboratively by me and my team; icons within the presentation (and these images) are designed by Freepik (flaticon.com).
MySQL, Jupyter, Microsoft Excel 2019, Google Slides, and Visual Studio icons created by Icons8.