Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

Machine Learning

04.exercise-clustering-with-k-means【练习:K均值聚类】

This notebook is an exercise in the Feature Engineering course. You can reference the tutorial at this link. Introduction 介绍 In this exercise you’ll explore our first unsupervised learning technique for creating features, k-means clustering. 在本练习中,您将探索我们第一个用于创建特征的无监督学习技术:k 均值聚类。 Run this cell to set everything up! 运行这个单元格来设置一切! # Setup feedback system from learntools.core import binder binder.bind(globals()) from […]

04.course-clustering-with-k-means【K均值聚类】

Introduction 介绍 This lesson and the next make use of what are known as unsupervised learning algorithms. Unsupervised algorithms don’t make use of a target; instead, their purpose is to learn some property of the data, to represent the structure of the features in a certain way. In the context of feature engineering for prediction, […]

03.exercise-creating-features【练习:创建特征】

This notebook is an exercise in the Feature Engineering course. You can reference the tutorial at this link. Introduction 介绍 In this exercise you’ll start developing the features you identified in Exercise 2 as having the most potential. As you work through this exercise, you might take a moment to look at the data documentation […]

03.course-creating-features【创建特征】

Introduction 介绍 Once you’ve identified a set of features with some potential, it’s time to start developing them. In this lesson, you’ll learn a number of common transformations you can do entirely in Pandas. If you’re feeling rusty, we’ve got a great course on Pandas. 一旦您确定了一组具有一定潜力的特征,就可以开始开发它们了。 在本课程中,您将学习一些完全可以在 Pandas 中完成的常见转换。 如果您感到生疏,我们有一个很棒的Pandas 课程。 We’ll use four datasets […]

02.exercise-mutual-information【练习:互信息】

This notebook is an exercise in the Feature Engineering course. You can reference the tutorial at this link. Introduction 介绍 In this exercise you’ll identify an initial set of features in the Ames dataset to develop using mutual information scores and interaction plots. 在本练习中,您将确定 Ames 数据集中的一组初始特征,以使用互信息分数和交互图进行开发。 Run this cell to set everything up! 运行这个单元格来设置一切! # […]

02.course-mutual-information【互信息】

Introduction 介绍 First encountering a new dataset can sometimes feel overwhelming. You might be presented with hundreds or thousands of features without even a description to go by. Where do you even begin? 第一次遇到新的数据集有时会让人感到不知所措。 您可能会看到成百上千个特征,甚至没有任何说明。 你从哪里开始呢? A great first step is to construct a ranking with a feature utility metric, a function measuring associations between […]

07.exercise-data-leakage【练习:数据泄漏】

This notebook is an exercise in the Intermediate Machine Learning course. You can reference the tutorial at this link. Most people find target leakage very tricky until they’ve thought about it for a long time. 大多数人都认为目标泄漏非常棘手,直到他们思考了很长时间。 So, before trying to think about leakage in the housing price example, we’ll go through a few examples in […]

07.course-data-leakage【数据泄露】

In this tutorial, you will learn what data leakage is and how to prevent it. If you don’t know how to prevent it, leakage will come up frequently, and it will ruin your models in subtle and dangerous ways. So, this is one of the most important concepts for practicing data scientists. 在本教程中,您将了解什么是数据泄露以及如何防止它。 如果您不知道如何预防,泄漏就会频繁发生,并且会以微妙而危险的方式毁掉您的模型。 因此,这是数据科学家实践中最重要的概念之一。 […]

06.exercise-xgboost【练习:XGBoost】

This notebook is an exercise in the Intermediate Machine Learning course. You can reference the tutorial at this link. In this exercise, you will use your new knowledge to train a model with gradient boosting. 在本练习中,您将使用新知识来训练梯度提升的模型。 Setup 设置 The questions below will give you feedback on your work. Run the following cell to set up […]

06.course-xgboost【XGBoost】

In this tutorial, you will learn how to build and optimize models with gradient boosting. This method dominates many Kaggle competitions and achieves state-of-the-art results on a variety of datasets. 在本教程中,您将学习如何使用梯度提升构建和优化模型。 该方法在许多 Kaggle 竞赛中占据主导地位,并在各种数据集上取得了最好的结果。 Introduction 介绍 For much of this course, you have made predictions with the random forest method, which achieves better performance than […]

Scroll to top