This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.

Recap

So far, you have loaded your data and reviewed it with the following code. Run this cell to set up your coding environment where the previous step left off.

回顾

到目前为止，您已经加载了数据并使用以下代码对其进行了检查。在上一步结束的位置运行此单元以设置编码环境。

# Code you have previously used to load data
import pandas as pd

# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'

home_data = pd.read_csv(iowa_file_path)

# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex3 import *

print("Setup Complete")

Setup Complete

Exercises

练习

Step 1: Specify Prediction Target

Select the target variable, which corresponds to the sales price. Save this to a new variable called y. You'll need to print a list of the columns to find the name of the column you need.

步骤1：指定预测目标

选择与销售价格相对应的目标变量。将其保存到名为y的新变量中。您需要打印列列表才能找到所需列的名称。

# print the list of columns in the dataset to find the name of the prediction target
# 打印数据集中的列列表以查找预测目标的名称
home_data.info()


RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             1460 non-null   int64  
 1   MSSubClass     1460 non-null   int64  
 2   MSZoning       1460 non-null   object 
 3   LotFrontage    1201 non-null   float64
 4   LotArea        1460 non-null   int64  
 5   Street         1460 non-null   object 
 6   Alley          91 non-null     object 
 7   LotShape       1460 non-null   object 
 8   LandContour    1460 non-null   object 
 9   Utilities      1460 non-null   object 
 10  LotConfig      1460 non-null   object 
 11  LandSlope      1460 non-null   object 
 12  Neighborhood   1460 non-null   object 
 13  Condition1     1460 non-null   object 
 14  Condition2     1460 non-null   object 
 15  BldgType       1460 non-null   object 
 16  HouseStyle     1460 non-null   object 
 17  OverallQual    1460 non-null   int64  
 18  OverallCond    1460 non-null   int64  
 19  YearBuilt      1460 non-null   int64  
 20  YearRemodAdd   1460 non-null   int64  
 21  RoofStyle      1460 non-null   object 
 22  RoofMatl       1460 non-null   object 
 23  Exterior1st    1460 non-null   object 
 24  Exterior2nd    1460 non-null   object 
 25  MasVnrType     588 non-null    object 
 26  MasVnrArea     1452 non-null   float64
 27  ExterQual      1460 non-null   object 
 28  ExterCond      1460 non-null   object 
 29  Foundation     1460 non-null   object 
 30  BsmtQual       1423 non-null   object 
 31  BsmtCond       1423 non-null   object 
 32  BsmtExposure   1422 non-null   object 
 33  BsmtFinType1   1423 non-null   object 
 34  BsmtFinSF1     1460 non-null   int64  
 35  BsmtFinType2   1422 non-null   object 
 36  BsmtFinSF2     1460 non-null   int64  
 37  BsmtUnfSF      1460 non-null   int64  
 38  TotalBsmtSF    1460 non-null   int64  
 39  Heating        1460 non-null   object 
 40  HeatingQC      1460 non-null   object 
 41  CentralAir     1460 non-null   object 
 42  Electrical     1459 non-null   object 
 43  1stFlrSF       1460 non-null   int64  
 44  2ndFlrSF       1460 non-null   int64  
 45  LowQualFinSF   1460 non-null   int64  
 46  GrLivArea      1460 non-null   int64  
 47  BsmtFullBath   1460 non-null   int64  
 48  BsmtHalfBath   1460 non-null   int64  
 49  FullBath       1460 non-null   int64  
 50  HalfBath       1460 non-null   int64  
 51  BedroomAbvGr   1460 non-null   int64  
 52  KitchenAbvGr   1460 non-null   int64  
 53  KitchenQual    1460 non-null   object 
 54  TotRmsAbvGrd   1460 non-null   int64  
 55  Functional     1460 non-null   object 
 56  Fireplaces     1460 non-null   int64  
 57  FireplaceQu    770 non-null    object 
 58  GarageType     1379 non-null   object 
 59  GarageYrBlt    1379 non-null   float64
 60  GarageFinish   1379 non-null   object 
 61  GarageCars     1460 non-null   int64  
 62  GarageArea     1460 non-null   int64  
 63  GarageQual     1379 non-null   object 
 64  GarageCond     1379 non-null   object 
 65  PavedDrive     1460 non-null   object 
 66  WoodDeckSF     1460 non-null   int64  
 67  OpenPorchSF    1460 non-null   int64  
 68  EnclosedPorch  1460 non-null   int64  
 69  3SsnPorch      1460 non-null   int64  
 70  ScreenPorch    1460 non-null   int64  
 71  PoolArea       1460 non-null   int64  
 72  PoolQC         7 non-null      object 
 73  Fence          281 non-null    object 
 74  MiscFeature    54 non-null     object 
 75  MiscVal        1460 non-null   int64  
 76  MoSold         1460 non-null   int64  
 77  YrSold         1460 non-null   int64  
 78  SaleType       1460 non-null   object 
 79  SaleCondition  1460 non-null   object 
 80  SalePrice      1460 non-null   int64  
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB

y = home_data['SalePrice']

# Check your answer
step_1.check()

Correct

# The lines below will show you a hint or the solution.
# step_1.hint() 
step_1.solution()

Solution:

y = home_data.SalePrice

Step 2: Create X

Now you will create a DataFrame called X holding the predictive features.

Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X.

You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):

LotArea
YearBuilt
1stFlrSF
2ndFlrSF
FullBath
BedroomAbvGr
TotRmsAbvGrd

After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.

步骤 2：创建 X

现在您将创建一个名为X的 DataFrame，其中包含预测功能。

由于您只需要原始数据中的某些列，因此您将首先创建一个列表，其中包含X中所需列的名称。

您将仅使用列表中的以下列（您可以复制并粘贴整个列表以节省一些输入，但您仍然需要添加引号）：

LotArea 地块面积
YearBuilt 建成年份
1stFlrSF 第一层SF
2ndFlrSF 第二层SF
FullBath 全套浴室
BedroomAbvGr 地面卧室?
TotRmsAbvGrd 地面总房间?

# Create the list of features below
feature_names = ['LotArea', 'YearBuilt','1stFlrSF','2ndFlrSF','FullBath','BedroomAbvGr','TotRmsAbvGrd']

# Select data corresponding to features in feature_names
# 选择feature_names中特征对应的数据
X = home_data[feature_names]

# Check your answer
step_2.check()

Correct

# step_2.hint()
step_2.solution()

Solution:

feature_names = ["LotArea", "YearBuilt", "1stFlrSF", "2ndFlrSF",
                      "FullBath", "BedroomAbvGr", "TotRmsAbvGrd"]

X=home_data[feature_names]

Review Data

Before building a model, take a quick look at X to verify it looks sensible

查看数据

在构建模型之前，快速浏览一下 X 以验证它看起来是否合理

# Review data
# print description or statistics from X
#print(_)
print(X.describe())

# print the top few lines
#print(_)
print(X.head())

             LotArea    YearBuilt     1stFlrSF     2ndFlrSF     FullBath  \
count    1460.000000  1460.000000  1460.000000  1460.000000  1460.000000   
mean    10516.828082  1971.267808  1162.626712   346.992466     1.565068   
std      9981.264932    30.202904   386.587738   436.528436     0.550916   
min      1300.000000  1872.000000   334.000000     0.000000     0.000000   
25%      7553.500000  1954.000000   882.000000     0.000000     1.000000   
50%      9478.500000  1973.000000  1087.000000     0.000000     2.000000   
75%     11601.500000  2000.000000  1391.250000   728.000000     2.000000   
max    215245.000000  2010.000000  4692.000000  2065.000000     3.000000   

       BedroomAbvGr  TotRmsAbvGrd  
count   1460.000000   1460.000000  
mean       2.866438      6.517808  
std        0.815778      1.625393  
min        0.000000      2.000000  
25%        2.000000      5.000000  
50%        3.000000      6.000000  
75%        3.000000      7.000000  
max        8.000000     14.000000  
   LotArea  YearBuilt  1stFlrSF  2ndFlrSF  FullBath  BedroomAbvGr  \
0     8450       2003       856       854         2             3   
1     9600       1976      1262         0         2             3   
2    11250       2001       920       866         2             3   
3     9550       1915       961       756         1             3   
4    14260       2000      1145      1053         2             4   

   TotRmsAbvGrd  
0             8  
1             6  
2             6  
3             7  
4             9

Step 3: Specify and Fit Model

Create a DecisionTreeRegressor and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.

Then fit the model you just created using the data in X and y that you saved above.

步骤 3：指定并拟合模型

创建一个DecisionTreeRegressor并将其保存为 iowa_model。确保您已从 sklearn 完成相关导入然后运行此命令。

然后使用上面保存的X和y中的数据拟合您刚刚创建的模型。

from sklearn.tree import DecisionTreeRegressor

# from _ import _
#specify the model. 
#For model reproducibility, set a numeric value for random_state when specifying the model
#为了模型的重现性，在指定模型时为random_state设置一个数值
iowa_model = DecisionTreeRegressor(random_state=55)

# Fit the model
iowa_model.fit(X,y)

# Check your answer
step_3.check()

Correct

# step_3.hint()
step_3.solution()

Solution:

from sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X, y)

Step 4: Make Predictions

Make predictions with the model's predict command using X as the data. Save the results to a variable called predictions.

步骤 4：做出预测

使用模型的predict命令使用X作为数据进行预测。将结果保存到名为predictions的变量中。

predictions = iowa_model.predict(X)
print(predictions)

# Check your answer
step_4.check()

[208500. 181500. 223500. ... 266500. 142125. 147500.]

Correct

# step_4.hint()
step_4.solution()

Solution:

iowa_model.predict(X)

Think About Your Results

Use the head method to compare the top few predictions to the actual home values (in y) for those same homes. Anything surprising?

思考你的结果

使用head方法将前几个预测与这些相同房屋的实际房屋值（以y表示）进行比较。有什么令人惊讶的吗？

# You can write code in this cell
from sklearn.metrics import mean_squared_log_error

mean_squared_log_error(predictions, y)

2.885347464787131e-05

It's natural to ask how accurate the model's predictions will be and how you can improve that. That will be you're next step.

人们很自然地会问模型的预测有多准确以及如何改进它。那将是你的下一步。

Keep Going

You are ready for Model Validation.

继续前进

您已准备好模型验证。

03.exercise-your-first-machine-learning-model【练习：第一个机器学习模型】

03.exercise-your-first-machine-learning-model【练习：第一个机器学习模型】

Recap

回顾

Exercises

练习

Step 1: Specify Prediction Target

步骤1：指定预测目标

Step 2: Create X

步骤 2：创建 X

Review Data

查看数据

Step 3: Specify and Fit Model

步骤 3：指定并拟合模型

Step 4: Make Predictions

步骤 4：做出预测

Think About Your Results

思考你的结果

Keep Going

继续前进

Leave a Reply Cancel reply