Nathaniel Boyle
Professional Site
Click on a project title, or the image beside it, to view documentation.
In this capstone project for the Google Data Analytics Certificate offered by Google through Coursera, analyses are performed over the publicly available financial information of thirty different companies across ten industries. Using SQL, the public dataset library of BigQuery (Google's data warehouse) is queried for information of publicly traded companies from the SEC Public Dataset and then downloaded into tables. Then with a combination of SQL and Excel the data in those tables is transformed into a standardized format so that comparisons can be drawn between the companies. The standardized data is then uploaded to R Studio where R programming is used to produce the visualizations of various financial ratios and growth patterns to evaluate the companies against one another. This is the most extensive project to date.
This project was the culminating assignment for the capstone course of the UC Davis Learn SQL Basics for Data Science Specialization offered through Coursera. This capstone project required picking a dataset and then using Python to upload the selected data into a pandas data frame within a Jupyter notebook. Once the data was loaded into a data frame we were tasked with using a combination of Python and SQL code to query and explore the data, form a hypothesis, and then ultimately run an analysis to test or confirm our hypotheses. Building off of the the earlier courses in the specialization I chose to use an A/B testing analysis as well as the Yelp dataset again. The other two options were sports performance data or data gathered from political tweets and I wanted to stick with business related data. The capstone course contained 4 milestones for the project, the PDF attached contains milestone 4, but has the earlier milestones embedded inside the file as attachments. This is the most recent project to date.
In this final assignment for the "Data Wrangling, Analysis, and AB Testing with SQL" online course, offered by UC Davis through Coursera, we were tasked with using SQL to create variables and queries for an example dataset of an imaginary E-commerce company for the purposes of performing AB testing. Then, using the results of our queries we found the p-values for certain treatments to see if there were any statistically significant effects on whether an item was either viewed or ordered.
This assignment was designed to test the knowledge of a wide range of concepts and SQL design techniques discussed throughout the "SQL for Data Science" online course offered by UC Davis through Coursera. For this assignment, specifically, we were tasked with playing the role of a real-world data scientist using SQL to both answer specific questions for an organization and make inferences based on our discoveries. The dataset used in the assignment came from a US-based organization called Yelp, which provides a platform for users to provide reviews and rate their interactions with a variety of organizations – businesses, restaurants, health clubs, hospitals, local governmental offices, charitable organizations, etc. Yelp has made a portion of this data available for personal, educational, and academic purposes. The assignment asked a series of questions regarding the data to help us profile and better understand the data in the first part of the assignment. Once we had answered each question, we were to come up with our own question for analysis and prepare a dataset for the analysis we choose to do in the second part of the assignment.
Currently under construction. Please check back for more.