Ben Hayes

Optimizing Language Models with DSPy

Prompt engineering language models lacks necessary refinement especially for enterprise solutions. Optimize your language models with DSPy!

Jul 3, 2024 12 minutes Ben Hayes

Optimizing language models is an important task as organizations integrate generative AI capabilities into their business processes. Relying on manual prompt engineering introduces risk, biases, and other problems. Continue reading to learn how to mitigate these risks with DSPy.

Multimodal AI with LLaVA on Databricks

LLaVA, among other multimodal AI models, leaps from standalone large language models and allows us to engage with AI in novel ways.

Mar 22, 2024 14 minutes Ben Hayes

Understanding our world often requires interpreting stimuli from multiple sources including through vision or text. LLaVA accomplishes this by combining a vision encoder with a large language model. Continue reading to learn how to leverage multimodal AI with Databricks and LLaVA! 🌋

Object Detection with YOLOv8 on Databricks

Using Ultralytics YOLOv8 and Databricks, we can build and deploy an AI model to reliably detect objects.

Feb 24, 2024 12 minutes Ben Hayes

For many AI-powered use cases, object detection emerges as a critical capability. Examples include driving cars, performing surgery, or identifying product defects. Continue reading to learn how to leverage Databricks and YOLOv8 to detect objects! 🤖

Getting Started with OpenAI

OpenAI emerged as a leader in generative AI and is helping enterprises modernize their processes at a rapid pace. Get started with generative AI, OpenAI, and more!

Oct 9, 2023 8 minutes Ben Hayes

Generative AI is a compelling technology whether you are an artist looking for inspiration or a CEO looking to innovate. OpenAI captured the world's imagination after introducing ChatGPT - how can you get started with OpenAI's platform? Continue reading to learn more.

Hands-on with Edge AI

Edge AI emerges at the intersection of AI, microcontrollers, and process automation. Jump start your understanding of these topics and more!

Mar 4, 2023 9 minutes Ben Hayes

Deploying AI in the field challenges organizations to this day. Thanks to innovation, constraints such as limited tools, employee skills, and hardware specifications are becoming blockers of the past. Continue reading to learn more.

Data Science at McKinsey

A data scientist's experience at The Firm.

Jul 23, 2022 6 minutes Ben Hayes

Data science, ML, and AI are changing organizations. What does this mean for consultants and clients? Continue reading to learn about my experience as a data scientist at McKinsey.

Building Data Pipelines with Kedro

Kedro, an open-source Python framework, helps data science teams create maintainable data pipelines.

Feb 26, 2022 12 minutes Ben Hayes

Folks have compared data to oil in the past, and while the metaphor may be tired, the process of building pipelines in both cases consumes valuable time and resources, and often results in a mess. Read more to learn about Kedro!

Getting Started with Databricks

Databricks is the Data + AI company and the provider of a leading data science and machine learning platform.

Jan 14, 2022 6 minutes Ben Hayes

Ingesting, cleaning, and analyzing data on Databricks accelerates your data science and machine learning projects by leveraging the underlying data lake and Spark engine. Continue reading to get started with Databricks.

Introduction to Hyperopt

Hyperopt is a popular tool for optimizing hyperparameters in machine learning and data science.

Dec 11, 2021 7 minutes Ben Hayes

Hyperopt enables you to perform optimization in parallel over a defined search space to find the optimal parameters. Continue reading to get started with Hyperopt.

Linear Optimization with MATLAB/GAMS

Optimization with MATLAB and GAMS can aid decision-makers in the for-profit, non-profit, and public sectors.

Apr 30, 2021 18 minutes Ben Hayes

Using optimization methods requires structuring problems, identifying objectives, and imposing constraints given a problem context. Continue reading to learn how to solve optmization problems in MATLAB and GAMS.

Exploring Probability Distributions

Probability distributions are used throughout quantitative fields where uncertainty or randomness are involved.

Mar 21, 2021 22 minutes Ben Hayes

Probabilistic thinking requires knowledge of probability distributions and the relevant statistics associated with each. Understanding these distributions can help you identify opportunities to leverage one. Continue reading to learn about the core concepts and essential distributions with examples in Python.

Interpreting Machine Learning with SHAP

SHAP or Shapley Additive Explanations helps you understand ML models.

Feb 25, 2021 13 minutes Ben Hayes

The SHAP framework unifies the methods used to interpret and explain machine learning models. This post helps interpret and explain SHAP. Read this post to start getting into SHAP (with both high-level explanation and python example).

Time Series Analysis with Facebook Prophet

Time series analysis is a powerful technique to generate forecasts with seasonal data.

Feb 6, 2021 11 minutes Ben Hayes

Data sets contain noise but with high-powered or elegant data science, the relevant signal can be extracted. One key technique for analysis of real-world data (primarily focused on forecasting) is time-series analysis. A popular time-series forecasting procedure is Facebook's open-source Prophet procedure. Prophet is implemented in both R and Python.

Thank you, Carnegie Mellon University!

An expression of gratitude and summary of experience from one of Carnegie Mellon's latest graduates.

May 8, 2019 10 minutes Ben Hayes

Carnegie Mellon University's Heinz College offered a unique opportunity to jointly study public policy, management, and data analytics. The coursework covers topics ranging from machine learning, deep learning, econometrics, and optimization to organizational design and decision-making. Continue reading to learn why I am glad I enrolled.

What's Cooking? Cuisine Classification

Using only ingredients, learn how to classify recipes into cuisines using Python, classification methods, clustering methods, and Sci-kit Learn.

Feb 17, 2019 14 minutes Ben Hayes

While seemingly a trivial task, classifying recipes into cuisines and understanding how to interpret clustering and classification results can help you creatively answer other questions. Continue reading to learn how.

Analyzing Suspicious IP Addresses

Suspicious connection requests are a common occurrence for Linux system administrators. From who and where are these suspicious requests originating?

Jan 28, 2019 13 minutes Ben Hayes

Cyber attacks are becoming more common as more data are stored in digital form. Suspicious, and likely malicious, users are requesting access to unauthorized resources while hoping for vulnerable networks or systems. Continue reading to learn how to apply descriptive analytics to uncover who is generating these suspicious requests.

Introduction to Deep Learning with Python

From concept to code: introduce yourself to neural networks and deep learning.

Nov 11, 2018 11 minutes Ben Hayes

Neural networks are a growing area of research and are being applied to new problems every day. The classic examples are image classification, facial recognition, and self-driving cars. The future is uncertain, but there is a high probability that neural networks in some form will play a critical role in shaping it. Continue reading to learn about the core concepts and walk through an example in Python.

Python vs R: The Great Data Science Debate

Python and R are popular in data science. Which is best?

Oct 27, 2018 14 minutes Ben Hayes

Data science is a broad and deepening field but one divisive question still remains. In this post, I compare and contrast Python and R for use in different data science tasks. Continue reading to learn which one emerges triumphant.

Building Shiny Dashboards

Learn how to build a shiny dashboard in R to help users analyze, visualize, and understand their data.

Sep 23, 2018 26 minutes Ben Hayes

Shiny dashboards provide a simple and fast way to analyze and visualize data. Whether performing exploratory data analysis or building a robust tool for your client's executives, Shiny dashboards aid the data science process. Continue reading to walk through an example of constructing an R Shiny Dashboard.

ACLU Data Science Internship

Data science for good: my internship with the ACLU in NYC

Aug 28, 2018 7 minutes Ben Hayes

Data science can change how any organization operates not just Facebook or Amazon, but even the American Civil Liberties Union - an organization 10 times older than Facebook. Continue reading to learn about my experience interning with the ACLU's data science team in New York.

Predicting Criminal Recidivism with R

Can data science indicate what factors affect the rate of criminal or violent recidivism? (Hint: Yes)

Mar 20, 2018 45 minutes Ben Hayes

Around the United States, municipalities have turned to risk assessment instruments (RAIs) to help judges determine which individuals to release on bail and which ones to keep in custody. The risk assessment process varies based on the specific instrument used but many rely on criminal recidivism data sets. These data sets typically contain various demographic indicators (age, race, gender, etc.) and also criminal history (charges, juvenile record, etc.).

Broward County, Florida, has turned to the use of one of the most popular RAIs today: COMPAS or the Correctional Offender Management Profiling for Alternative Sanctions tool. COMPAS assesses individuals based on criminal history and social profiling to categorize an individual as low, medium, or high risk. This tool, however, was not developed using the Broward County data set which may lead to poor performing predictions for individuals from Broward County, Florida. In this post we construct an RAI, compare to COMPAS and discuss findings.

The Gender Income Gap

How do factors (besides gender) affect the gender income gap?

Feb 11, 2018 20 minutes Ben Hayes

The issue of gender-income disparity is not new - it is nuanced. In order to answer these questions, we will rely on the National Longitudinal Survey of Youth, 1979 cohort, data set (abbreviated ‘NLSY79’). To draw strong conclusions, we must evaluate the data set provided - is it accurate, relevant, and useful for drawing statistical conclusions? Once we summarize the data, we can discuss methodology - how should we approach the data, what variables should we consider, what techniques are appropriate? Third, we openly discuss findings about the sampled individuals and attempt to infer relationships about the income difference (if any) between men and women in the larger population based on other factors. Lastly, we end with a discussion of the relevancy and signicance of our findings given the context of the survey data available and the methodology used.

Demystifying the Confusion Matrix

A simple 2x2 table can contain insightful metrics that enhance your decision-making.

Jan 22, 2018 7 minutes Ben Hayes

Understanding the confusion matrix is an important step in statistics, machine learning, or any other field where predictions or classifications are common. The confusion matrix is a type of contingency table with two dimensions that reveal how well a predictive model performs when the outcomes are known. Additionally, when associated costs of incorrect positive and negative guesses differ, the trade-offs can be optimized. Do you know the difference between Sensitivity, Specificity, Recall, Precision, True Positive Rate, and Positive Predictive Value?

New Year, New Website, Same Ben

How Hugo and I created this new personal blog.

Jan 1, 2018 4 minutes Ben Hayes

Read the story about how I replaced my old website (from 2009) with a new one that is more modern, features easy publishing, and requires low maintenance. Along the way, I include tips and tricks that will help you create your own website.

Blog

About

Contact

Resume/CV