Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

04.exercise-model-validation【练习:模型验证】

This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.


Recap

You've built a model. In this exercise you will test how good your model is.

Run the cell below to set up your coding environment where the previous exercise left off.

回顾

你已经建立了一个模型。 在本练习中,您将测试您的模型有多好。

运行下面的单元格来设置上一个练习结束时的编码环境。

# Code you have previously used to load data
import pandas as pd
from sklearn.tree import DecisionTreeRegressor

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

home_data = pd.read_csv(iowa_file_path)
y = home_data.SalePrice
feature_columns = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[feature_columns]

# Specify Model
iowa_model = DecisionTreeRegressor()
# Fit Model
iowa_model.fit(X, y)

print("First in-sample predictions:", iowa_model.predict(X.head()))
print("Actual target values for those homes:", y.head().tolist())

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex4 import *
print("Setup Complete")
First in-sample predictions: [208500. 181500. 223500. 140000. 250000.]
Actual target values for those homes: [208500, 181500, 223500, 140000, 250000]
Setup Complete

Exercises

练习

Step 1: Split Your Data

Use the train_test_split function to split up your data.

Give it the argument random_state=1 so the check functions know what to expect when verifying your code.

Recall, your features are loaded in the DataFrame X and your target is loaded in y.

第 1 步:拆分数据

使用train_test_split函数来分割数据。

给它参数random_state=1,这样check函数就知道在验证代码时会发生什么。

回想一下,您的功能加载在 DataFrame X 中,您的目标加载在 y 中。

# Import the train_test_split function and uncomment
# from _ import _

from sklearn.model_selection import train_test_split
# fill in and uncomment
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Check your answer
step_1.check()

Correct

# The lines below will show you a hint or the solution.
# step_1.hint() 
step_1.solution()

Solution:

from sklearn.model_selection import train_test_split
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

Step 2: Specify and Fit the Model

Create a DecisionTreeRegressor model and fit it to the relevant data.
Set random_state to 1 again when creating the model.

步骤 2:指定并拟合模型

创建一个DecisionTreeRegressor模型并将其拟合到相关数据。
创建模型时再次将 random_state 设置为 1。

# You imported DecisionTreeRegressor in your last exercise
# and that code has been copied to the setup code above. So, no need to
# import it again

# Specify the model
iowa_model = DecisionTreeRegressor(random_state=1)

# Fit iowa_model with the training data.
iowa_model.fit(train_X, train_y)

# Check your answer
step_2.check()
[186500. 184000. 130000.  92000. 164500. 220000. 335000. 144152. 215000.
 262000.]
[186500. 184000. 130000.  92000. 164500. 220000. 335000. 144152. 215000.
 262000.]

Correct

# step_2.hint()
step_2.solution()

Solution:

iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(train_X, train_y)

Step 3: Make Predictions with Validation data

步骤 3:使用验证数据进行预测

# Predict with all validation observations
val_predictions = iowa_model.predict(val_X)

# Check your answer
step_3.check()

Correct

# step_3.hint()
step_3.solution()

Solution:

val_predictions = iowa_model.predict(val_X)

Inspect your predictions and actual values from validation data.

检查验证数据中的预测和实际值。

# print the top few validation predictions
# 打印前几个验证数据的预测值
print(val_predictions[:5])
# print the top few actual prices from validation data
# 打印前几个验证数据的实际值
print(val_y[:5])
[186500. 184000. 130000.  92000. 164500.]
258     231500
267     179500
288     122000
649      84500
1233    142000
Name: SalePrice, dtype: int64

What do you notice that is different from what you saw with in-sample predictions (which are printed after the top code cell in this page).

Do you remember why validation predictions differ from in-sample (or training) predictions? This is an important idea from the last lesson.

您注意到什么与您在样本内预测(打印在本页顶部代码单元之后)中看到的不同。

您还记得为什么验证预测与样本内(或训练)预测不同吗? 这是上一课的一个重要思想。

Step 4: Calculate the Mean Absolute Error in Validation Data

步骤 4:计算验证数据中的平均绝对误差

from sklearn.metrics import mean_absolute_error
val_mae = mean_absolute_error(val_predictions, val_y)

# uncomment following line to see the validation_mae
print(val_mae)

# Check your answer
step_4.check()
29652.931506849316

Correct

# step_4.hint()
step_4.solution()

Solution:

val_mae = mean_absolute_error(val_y, val_predictions)

Is that MAE good? There isn't a general rule for what values are good that applies across applications. But you'll see how to use (and improve) this number in the next step.

那个MAE好用吗? 对于哪些值适合跨应用程序,没有通用规则。 但您将在下一步中看到如何使用(和改进)这个数字。

Keep Going

You are ready for Underfitting and Overfitting.

继续

您已准备好欠拟合和过拟合


Have questions or comments? Visit the Learn Discussion forum to chat with other Learners.

04.exercise-model-validation【练习:模型验证】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top