Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

01.course-how-models-work【模型如何工作】

Introduction

We'll start with an overview of how machine learning models work and how they are used. This may feel basic if you've done statistical modeling or machine learning before. Don't worry, we will progress to building powerful models soon.

This course will have you build models as you go through following scenario:

Your cousin has made millions of dollars speculating on real estate. He's offered to become business partners with you because of your interest in data science. He'll supply the money, and you'll supply models that predict how much various houses are worth.

You ask your cousin how he's predicted real estate values in the past, and he says it is just intuition. But more questioning reveals that he's identified price patterns from houses he has seen in the past, and he uses those patterns to make predictions for new houses he is considering.

Machine learning works the same way. We'll start with a model called the Decision Tree. There are fancier models that give more accurate predictions. But decision trees are easy to understand, and they are the basic building block for some of the best models in data science.

For simplicity, we'll start with the simplest possible decision tree.

First Decision Trees

It divides houses into only two categories. The predicted price for any house under consideration is the historical average price of houses in the same category.

We use data to decide how to break the houses into two groups, and then again to determine the predicted price in each group. This step of capturing patterns from data is called fitting or training the model. The data used to fit the model is called the training data.

The details of how the model is fit (e.g. how to split up the data) is complex enough that we will save it for later. After the model has been fit, you can apply it to new data to predict prices of additional homes.

介绍

我们将首先概述机器学习模型的工作原理及其使用方式。 如果您以前做过统计建模或机器学习,这可能会让您感觉很基础。 别担心,我们很快就会构建强大的模型。

本课程将让您在经历以下场景时构建模型:

你的表弟通过房地产投机赚了数百万美元。 由于您对数据科学的兴趣,他提出成为您的业务合作伙伴。 他将提供资金,你将提供预测各种房屋价值的模型。

你问你的表弟过去是如何预测房地产价值的,他说这只是直觉。 但更多的证据表明,他从过去见过的房屋中识别出价格模式,并利用这些模式来预测他正在考虑的新房子。

机器学习的工作原理相同。 我们将从一个称为决策树的模型开始。 有一些更高级的模型可以提供更准确的预测。 但决策树很容易理解,它们是数据科学中一些最佳模型的基本构建块。

为了简单起见,我们将从最简单的决策树开始。

第一个决策树

它将房屋仅分为两类。 任何考虑中的房屋的预测价格都是同类房屋的历史平均价格。

我们使用数据来决定如何将房屋分为两组,然后再次确定每组的预测价格。 从数据中捕获模式的这一步骤称为“拟合”或“训练”模型。 用于拟合模型的数据称为训练数据

模型如何拟合的细节(例如如何分割数据)非常复杂,我们将在以后保存它。 模型拟合后,您可以将其应用于新数据以预测其他房屋的价格。


Improving the Decision Tree

Which of the following two decision trees is more likely to result from fitting the real estate training data?

First Decision Trees

The decision tree on the left (Decision Tree 1) probably makes more sense, because it captures the reality that houses with more bedrooms tend to sell at higher prices than houses with fewer bedrooms. The biggest shortcoming of this model is that it doesn't capture most factors affecting home price, like number of bathrooms, lot size, location, etc.

You can capture more factors using a tree that has more "splits." These are called "deeper" trees. A decision tree that also considers the total size of each house's lot might look like this:
Depth 2 Tree

You predict the price of any house by tracing through the decision tree, always picking the path corresponding to that house's characteristics. The predicted price for the house is at the bottom of the tree. The point at the bottom where we make a prediction is called a leaf.

The splits and values at the leaves will be determined by the data, so it's time for you to check out the data you will be working with.

改进决策树

以下两个决策树中哪一个更可能是通过拟合房地产训练数据而产生的?

第一个决策树

左侧的决策树(决策树 1)可能更有意义,因为它反映了这样一个事实:卧室较多的房屋往往比卧室较少的房屋售价更高。 该模型最大的缺点是它没有捕捉到影响房价的大部分因素,如浴室数量、地块面积、位置等。

您可以使用具有更多“分裂”的树来捕获更多因素。 这些被称为“更深”的树。 还考虑每栋房屋地块总大小的决策树可能如下所示:
深度 2 树

您可以通过追踪决策树来预测任何房屋的价格,始终选择与房屋特征相对应的路径。 房屋的预测价格位于树的底部。 我们进行预测的底部点称为叶子

叶子上的分割和值将由数据决定,因此现在是您检查将要使用的数据的时候了。

Continue

Let's get more specific. It's time to Examine Your Data.

继续

让我们更具体一些。 是时候 检查您的数据

01.course-how-models-work【模型如何工作】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top