Open in app
Home
Notifications
Lists
Stories

Write
Andrew Young
Andrew Young

Home
About

Published in Towards Data Science

·Pinned

Every Data Scientist Should Know: The Bias-Variance Trade-off Generalization is Wrong

A groundbreaking and relatively new discovery upends classical statistics with relevant implications for data science practitioners and statistical consultants Introduction Data science is a fascinating field. C-level executives are enamored by its promised impact on top line revenue and practitioners are intrigued by the rapid pace of innovation. …

Data Science

11 min read

Something Every Data Scientist Should Know But Probably Doesn’t: The Bias-Variance Trade-off…
Something Every Data Scientist Should Know But Probably Doesn’t: The Bias-Variance Trade-off…

Published in Towards Data Science

·Pinned

Isolation Forest is the best Anomaly Detection Algorithm for Big Data Right Now

Isolation forest or “iForest” is an astoundingly beautiful and elegantly simple algorithm that identifies anomalies with few parameters. The original paper is accessible to a broad audience and contains minimal math. …

Anomaly Detection

8 min read

Isolation Forest is the best Anomaly Detection Algorithm for Big Data Right Now
Isolation Forest is the best Anomaly Detection Algorithm for Big Data Right Now

Jun 5, 2021

What is the Most Important Skill for a Data Scientist?

Hint: It’s not programming skills or familiarity with algorithms — The most important aspect of data science is communication. Algorithms, coding languages and software are important to know but these things are easily and quickly looked up when details become shrouded in the dust of time. …

Data Science

3 min read

What is the Most Important Skill for a Data Scientist?
What is the Most Important Skill for a Data Scientist?

Published in Towards Data Science

·May 25, 2021

Essential Big Data, Data Scientist Skill: How to Install JARs for an AWS EMR Cluster

Demonstrating where to download JARs and how to install them on AWS EMR clusters for access from EMR Notebooks — I have yet to see a straightforward and comprehensive guide on how to get JAR files onto every worker node of an EMR cluster and yet this is a critically important, common need. This article addresses those needs. …

Data Science

4 min read

Essential Big Data, Data Scientist Skill: How to Install JARs for an AWS EMR Cluster
Essential Big Data, Data Scientist Skill: How to Install JARs for an AWS EMR Cluster

Published in Towards Data Science

·May 25, 2021

Must-Know Presentation Tools for the Effective Data Scientist

Communicating a coherent, data-driven story is the most important skill for today’s data scientist yet the least developed. Better tools can help — learn about a new one today. — Over the years, I have seen many PhD-holding data scientists spend weeks or months building highly effective machine learning pipelines that (theoretically) will deliver real-world value. Unfortunately, these fruits of labor can die on the vine if they fail to effectively communicate the value of their work, a misfortune I…

Data Science

7 min read

Must-Know Presentation Tools for the Effective Data Scientist
Must-Know Presentation Tools for the Effective Data Scientist

Feb 17, 2021

How to Use Wireshark

A quick guide with code (i.e. my rough notes for replication purposes) Motivation There are a lot of interesting applications for packet capture data. I will refrain from stating them for corporate privacy reasons. Instructions (Part 1: Wireshark GUI ) This part is straightforward and useful for starting off. In part 2, I show you how to…

Wireshark

2 min read

How to Use Wireshark
How to Use Wireshark

Published in Towards Data Science

·Nov 17, 2020

How to Install XGBoost/CatBoost/etc. for an AWS EMR Notebook Environment

Setting up your Amazon Web Services (AWS) Elastic MapReduce (EMR) Cluster with XGBoost — Introduction This article assumes you are already familiar with what XGBoost/CatBoost/etc. do and that you are here to actually get them to work. Installing packages on a local machine/single node is easy. Doing the same for a cluster environment in order to work with big data is less so and the…

Xgboost

9 min read

Install XGBoost for AWS EMR Notebook Environment
Install XGBoost for AWS EMR Notebook Environment

Aug 18, 2020

Connect to an AWS EMR Master Node with PuTTY: A Visual Guide

Because AWS documentation is out-of-date, wrong, verbose yet not specific enough or requires you to read 5–10 different link trees of pages of documentation. Download the latest Stable installation of PuTTY (e.g. putty-64bit-0.74-installer.msi). The installation should also install needed utilities like puttygen and pageant. 2. Create an EMR instance (guide…

Putty

5 min read

Step-by-Step Guide to Connect to AWS EMR Instance CLI with PuTTY
Step-by-Step Guide to Connect to AWS EMR Instance CLI with PuTTY

Apr 23, 2020

Flourish: A Data Visualization Creator for Websites

Produce website-worthy visualizations Introduction Flourish is a simple browser-based point-and-click, drag-and-drop data visualization creator suitable for well-structured, tabular data in the form of .csv or Excel files. Introduced in March 2016¹, it is a relatively new tool compared to entrenched competitors like Tableau (founded in 2003²) and aims at an audience…

Data Visualization

5 min read

Flourish: A Data Visualization Creator for Websites
Flourish: A Data Visualization Creator for Websites

Dec 14, 2019

How to be a An Inspiring and Effective Data Science Leader

I had many managers and held multiple leadership roles during my life. This is a living list of notes based on those experiences. While the following concepts are widely applicable, I write with the data science team in mind. When given headcount, hiring should be your top priority. You can…

Data Science

6 min read

How to be a Good (Data Science) Leader
How to be a Good (Data Science) Leader
Andrew Young

Andrew Young

a data scientist https://www.linkedin.com/in/andrewyoung16/

Following
  • plotly

    plotly

  • Andrea

    Andrea

  • Satish Chandra Gupta

    Satish Chandra Gupta

  • P.J! Parmar

    P.J! Parmar

  • Rajiv Shah

    Rajiv Shah

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable