This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.

Recap

You've built a model. In this exercise you will test how good your model is.

Run the cell below to set up your coding environment where the previous exercise left off.

回顾

你已经建立了一个模型。在本练习中，您将测试您的模型有多好。

运行下面的单元格来设置上一个练习结束时的编码环境。

# Code you have previously used to load data
import pandas as pd
from sklearn.tree import DecisionTreeRegressor

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

home_data = pd.read_csv(iowa_file_path)
y = home_data.SalePrice
feature_columns = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[feature_columns]

# Specify Model
iowa_model = DecisionTreeRegressor()
# Fit Model
iowa_model.fit(X, y)

print("First in-sample predictions:", iowa_model.predict(X.head()))
print("Actual target values for those homes:", y.head().tolist())

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex4 import *
print("Setup Complete")

First in-sample predictions: [208500. 181500. 223500. 140000. 250000.]
Actual target values for those homes: [208500, 181500, 223500, 140000, 250000]
Setup Complete

Exercises

练习

Step 1: Split Your Data

Use the train_test_split function to split up your data.

Give it the argument random_state=1 so the check functions know what to expect when verifying your code.

Recall, your features are loaded in the DataFrame X and your target is loaded in y.

第 1 步：拆分数据

使用train_test_split函数来分割数据。

给它参数random_state=1，这样check函数就知道在验证代码时会发生什么。

回想一下，您的功能加载在 DataFrame X 中，您的目标加载在 y 中。

# Import the train_test_split function and uncomment
# from _ import _

from sklearn.model_selection import train_test_split

# fill in and uncomment
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Check your answer
step_1.check()

Correct

# The lines below will show you a hint or the solution.
# step_1.hint() 
step_1.solution()

Solution:

from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

Step 2: Specify and Fit the Model

Create a DecisionTreeRegressor model and fit it to the relevant data.
Set random_state to 1 again when creating the model.

步骤 2：指定并拟合模型

创建一个DecisionTreeRegressor模型并将其拟合到相关数据。
创建模型时再次将 random_state 设置为 1。

# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = DecisionTreeRegressor(random_state=1)

# Fit iowa_model with the training data.
iowa_model.fit(train_X, train_y)

# Check your answer
step_2.check()

[186500. 184000. 130000.  92000. 164500. 220000. 335000. 144152. 215000.
 262000.]
[186500. 184000. 130000.  92000. 164500. 220000. 335000. 144152. 215000.
 262000.]

Correct

# step_2.hint()
step_2.solution()

Solution:

iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)

Step 3: Make Predictions with Validation data

步骤 3：使用验证数据进行预测

# Predict with all validation observations
val_predictions = iowa_model.predict(val_X)

# Check your answer
step_3.check()

Correct

# step_3.hint()
step_3.solution()

Solution:

val_predictions = iowa_model.predict(val_X)

Inspect your predictions and actual values from validation data.

检查验证数据中的预测和实际值。

# print the top few validation predictions
# 打印前几个验证数据的预测值
print(val_predictions[:5])
# print the top few actual prices from validation data
# 打印前几个验证数据的实际值
print(val_y[:5])

[186500. 184000. 130000.  92000. 164500.]
258     231500
267     179500
288     122000
649      84500
1233    142000
Name: SalePrice, dtype: int64

What do you notice that is different from what you saw with in-sample predictions (which are printed after the top code cell in this page).

Do you remember why validation predictions differ from in-sample (or training) predictions? This is an important idea from the last lesson.

您注意到什么与您在样本内预测（打印在本页顶部代码单元之后）中看到的不同。

您还记得为什么验证预测与样本内（或训练）预测不同吗？这是上一课的一个重要思想。

Step 4: Calculate the Mean Absolute Error in Validation Data

步骤 4：计算验证数据中的平均绝对误差

from sklearn.metrics import mean_absolute_error
val_mae = mean_absolute_error(val_predictions, val_y)

# uncomment following line to see the validation_mae
print(val_mae)

# Check your answer
step_4.check()

29652.931506849316

Correct

# step_4.hint()
step_4.solution()

Solution:

val_mae = mean_absolute_error(val_y, val_predictions)

Is that MAE good? There isn't a general rule for what values are good that applies across applications. But you'll see how to use (and improve) this number in the next step.

那个MAE好用吗？对于哪些值适合跨应用程序，没有通用规则。但您将在下一步中看到如何使用（和改进）这个数字。

Keep Going

You are ready for Underfitting and Overfitting.

继续

您已准备好欠拟合和过拟合。

Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.

04.exercise-model-validation【练习：模型验证】

04.exercise-model-validation【练习：模型验证】

Recap

回顾

Exercises

练习

Step 1: Split Your Data

第 1 步：拆分数据

Step 2: Specify and Fit the Model

步骤 2：指定并拟合模型

Step 3: Make Predictions with Validation data

步骤 3：使用验证数据进行预测

Step 4: Calculate the Mean Absolute Error in Validation Data

步骤 4：计算验证数据中的平均绝对误差

Keep Going

继续

Leave a Reply Cancel reply