Posts by:

jcsladcik

Training

Retuning the Heavens: Machine Learning and Ancient Astronomy

Nov 1, 2022 5:09:11 PM

What can we learn about machine learning from ancient astronomy?

When thinking about Machine Learning it is easy to be model-centric and get caught up in the details of getting a new model up and running: preparing a dataset for machine learning, partitioning the training and test data, engineering features, selecting features, finding an appropriate metric, choosing a model, tuning the hyper-parameters. Being model-centric is reinforced by the fact that we don’t always have control of the data or how it was collected. In most cases, we are presented with a dataset collected by someone else and are asked what we can make of it. As a result, it is easy to just accept the data and over-fit your thinking about machine learning to the specifics of your modeling process and experience. Sometimes it is a good idea to step away from these details and remind yourself of the basic components of a model and its data, how they interact with each other, and how they evolve.

jcsladcik

Training, Transformation

Extracting Target Labels from Deep Learning Classification Models

Sep 5, 2022 4:40:58 AM

In the blog post Configuring a Neural Network Output Layer we highlighted how to correctly set up an output layer for deep learning models. Here, we discuss how to make sense of what a neural network actually returns from the output layers. If you are like me, you may have been surprised when you first encountered the output of a simple classification neural net.

jcsladcik

Training

Choosing the Right Number of Clusters

Jul 4, 2022 8:00:48 AM

Introduction

When I first started my machine learning journey, K-means clustering was one of the first algorithms I was introduced to – and it is still one of my favorites to this day. I was amazed at how elegant yet comprehensible the procedure was. There is something oddly satisfying about watching the cluster assignments and centroids being updated with each iteration. While K-means clustering has been tried and true since its inception in the 1950s, there is still one foundational requirement for employing this method: choosing the correct number of clusters – the K in K-means. In this month’s newsletter, we’ll explore a technique known as the elbow method to help determine the ideal number of clusters that should be chosen for a given clustering task. To conclude, we will explore another type of clustering algorithm (Affinity Propagation clustering) that does not require a predetermined number of clusters for execution.

jcsladcik

Training

Prospecting for Data on the Web

May 31, 2022 2:33:11 PM

Introduction

At Enthought we teach a lot of scientists and engineers about using Python and the ecosystem of scientific Python packages for processing, analyzing, and visualizing data. Most of what we teach involves nice, clean data sets–collections of data that have been carefully collected, scrubbed, and prepared for analysis. While we also mention in passing the idea of collecting data from the web, work a few examples of general data cleanup, and at least show our students each of the tools needed, we seldom have enough time in class to follow a complete, practical example of web data prospecting from end to end. This newsletter should help remedy that.

The Problem

While the internet is a great resource for many things, including data, the web’s wild and tangled nature presents a few problems:

jcsladcik

Life Sciences, Transformation

True DX in the Pharma R&D Lab Defined by Enthought

May 24, 2022 8:00:22 PM

Enthought’s team in Japan exhibited at the Pharma IT & Digital Health Expo 2022 life sciences conference in Tokyo, to meet with pharmaceutical industry leaders gathering for technological insight and to revitalize market growth. 200 companies exhibited across the 3-day in-person event, which drew over 6,700 attendees.

jcsladcik

Life Sciences, Transformation

Life Sciences Labs Optimize with New Digital Technologies and Upskilling

May 16, 2022 2:00:37 AM

Labs are resetting the trajectory for drug development: reducing timelines from years to months; decreasing costs from billions to millions; and gaining an advantage by delivering drugs to market in months rather than decades.

This value combination is a compelling case for investment in digital capability and organizational transformation.

jcsladcik

ent maia

jcsladcik

Retuning the Heavens: Machine Learning and Ancient Astronomy

What can we learn about machine learning from ancient astronomy?

Extracting Target Labels from Deep Learning Classification Models

Choosing the Right Number of Clusters

Introduction

Prospecting for Data on the Web

Introduction

The Problem

True DX in the Pharma R&D Lab Defined by Enthought

Life Sciences Labs Optimize with New Digital Technologies and Upskilling

Labs are resetting the trajectory for drug development: reducing timelines from years to months; decreasing costs from billions to millions; and gaining an advantage by delivering drugs to market in months rather than decades.