This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.
Recap
So far, you have loaded your data and reviewed it with the following code. Run this cell to set up your coding environment where the previous step left off.
回顾
到目前为止,您已经加载了数据并使用以下代码对其进行了检查。 在上一步结束的位置运行此单元以设置编码环境。
# Code you have previously used to load data
import pandas as pd
# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
home_data = pd.read_csv(iowa_file_path)
# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex3 import *
print("Setup Complete")
Setup Complete
Exercises
练习
Step 1: Specify Prediction Target
Select the target variable, which corresponds to the sales price. Save this to a new variable called y
. You'll need to print a list of the columns to find the name of the column you need.
步骤1:指定预测目标
选择与销售价格相对应的目标变量。 将其保存到名为y
的新变量中。 您需要打印列列表才能找到所需列的名称。
# print the list of columns in the dataset to find the name of the prediction target
# 打印数据集中的列列表以查找预测目标的名称
home_data.info()
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1460 non-null int64
1 MSSubClass 1460 non-null int64
2 MSZoning 1460 non-null object
3 LotFrontage 1201 non-null float64
4 LotArea 1460 non-null int64
5 Street 1460 non-null object
6 Alley 91 non-null object
7 LotShape 1460 non-null object
8 LandContour 1460 non-null object
9 Utilities 1460 non-null object
10 LotConfig 1460 non-null object
11 LandSlope 1460 non-null object
12 Neighborhood 1460 non-null object
13 Condition1 1460 non-null object
14 Condition2 1460 non-null object
15 BldgType 1460 non-null object
16 HouseStyle 1460 non-null object
17 OverallQual 1460 non-null int64
18 OverallCond 1460 non-null int64
19 YearBuilt 1460 non-null int64
20 YearRemodAdd 1460 non-null int64
21 RoofStyle 1460 non-null object
22 RoofMatl 1460 non-null object
23 Exterior1st 1460 non-null object
24 Exterior2nd 1460 non-null object
25 MasVnrType 588 non-null object
26 MasVnrArea 1452 non-null float64
27 ExterQual 1460 non-null object
28 ExterCond 1460 non-null object
29 Foundation 1460 non-null object
30 BsmtQual 1423 non-null object
31 BsmtCond 1423 non-null object
32 BsmtExposure 1422 non-null object
33 BsmtFinType1 1423 non-null object
34 BsmtFinSF1 1460 non-null int64
35 BsmtFinType2 1422 non-null object
36 BsmtFinSF2 1460 non-null int64
37 BsmtUnfSF 1460 non-null int64
38 TotalBsmtSF 1460 non-null int64
39 Heating 1460 non-null object
40 HeatingQC 1460 non-null object
41 CentralAir 1460 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1460 non-null int64
44 2ndFlrSF 1460 non-null int64
45 LowQualFinSF 1460 non-null int64
46 GrLivArea 1460 non-null int64
47 BsmtFullBath 1460 non-null int64
48 BsmtHalfBath 1460 non-null int64
49 FullBath 1460 non-null int64
50 HalfBath 1460 non-null int64
51 BedroomAbvGr 1460 non-null int64
52 KitchenAbvGr 1460 non-null int64
53 KitchenQual 1460 non-null object
54 TotRmsAbvGrd 1460 non-null int64
55 Functional 1460 non-null object
56 Fireplaces 1460 non-null int64
57 FireplaceQu 770 non-null object
58 GarageType 1379 non-null object
59 GarageYrBlt 1379 non-null float64
60 GarageFinish 1379 non-null object
61 GarageCars 1460 non-null int64
62 GarageArea 1460 non-null int64
63 GarageQual 1379 non-null object
64 GarageCond 1379 non-null object
65 PavedDrive 1460 non-null object
66 WoodDeckSF 1460 non-null int64
67 OpenPorchSF 1460 non-null int64
68 EnclosedPorch 1460 non-null int64
69 3SsnPorch 1460 non-null int64
70 ScreenPorch 1460 non-null int64
71 PoolArea 1460 non-null int64
72 PoolQC 7 non-null object
73 Fence 281 non-null object
74 MiscFeature 54 non-null object
75 MiscVal 1460 non-null int64
76 MoSold 1460 non-null int64
77 YrSold 1460 non-null int64
78 SaleType 1460 non-null object
79 SaleCondition 1460 non-null object
80 SalePrice 1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
y = home_data['SalePrice']
# Check your answer
step_1.check()
Correct
# The lines below will show you a hint or the solution.
# step_1.hint()
step_1.solution()
Solution:
y = home_data.SalePrice
Step 2: Create X
Now you will create a DataFrame called X
holding the predictive features.
Since you want only some columns from the original data, you'll first create a list with the names of the columns you want in X
.
You'll use just the following columns in the list (you can copy and paste the whole list to save some typing, though you'll still need to add quotes):
- LotArea
- YearBuilt
- 1stFlrSF
- 2ndFlrSF
- FullBath
- BedroomAbvGr
- TotRmsAbvGrd
After you've created that list of features, use it to create the DataFrame that you'll use to fit the model.
步骤 2:创建 X
现在您将创建一个名为X
的 DataFrame,其中包含预测功能。
由于您只需要原始数据中的某些列,因此您将首先创建一个列表,其中包含X
中所需列的名称。
您将仅使用列表中的以下列(您可以复制并粘贴整个列表以节省一些输入,但您仍然需要添加引号):
- LotArea 地块面积
- YearBuilt 建成年份
- 1stFlrSF 第一层SF
- 2ndFlrSF 第二层SF
- FullBath 全套浴室
- BedroomAbvGr 地面卧室?
- TotRmsAbvGrd 地面总房间?
# Create the list of features below
feature_names = ['LotArea', 'YearBuilt','1stFlrSF','2ndFlrSF','FullBath','BedroomAbvGr','TotRmsAbvGrd']
# Select data corresponding to features in feature_names
# 选择feature_names中特征对应的数据
X = home_data[feature_names]
# Check your answer
step_2.check()
Correct
# step_2.hint()
step_2.solution()
Solution:
feature_names = ["LotArea", "YearBuilt", "1stFlrSF", "2ndFlrSF",
"FullBath", "BedroomAbvGr", "TotRmsAbvGrd"]
X=home_data[feature_names]
Review Data
Before building a model, take a quick look at X to verify it looks sensible
查看数据
在构建模型之前,快速浏览一下 X 以验证它看起来是否合理
# Review data
# print description or statistics from X
#print(_)
print(X.describe())
# print the top few lines
#print(_)
print(X.head())
LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath \
count 1460.000000 1460.000000 1460.000000 1460.000000 1460.000000
mean 10516.828082 1971.267808 1162.626712 346.992466 1.565068
std 9981.264932 30.202904 386.587738 436.528436 0.550916
min 1300.000000 1872.000000 334.000000 0.000000 0.000000
25% 7553.500000 1954.000000 882.000000 0.000000 1.000000
50% 9478.500000 1973.000000 1087.000000 0.000000 2.000000
75% 11601.500000 2000.000000 1391.250000 728.000000 2.000000
max 215245.000000 2010.000000 4692.000000 2065.000000 3.000000
BedroomAbvGr TotRmsAbvGrd
count 1460.000000 1460.000000
mean 2.866438 6.517808
std 0.815778 1.625393
min 0.000000 2.000000
25% 2.000000 5.000000
50% 3.000000 6.000000
75% 3.000000 7.000000
max 8.000000 14.000000
LotArea YearBuilt 1stFlrSF 2ndFlrSF FullBath BedroomAbvGr \
0 8450 2003 856 854 2 3
1 9600 1976 1262 0 2 3
2 11250 2001 920 866 2 3
3 9550 1915 961 756 1 3
4 14260 2000 1145 1053 2 4
TotRmsAbvGrd
0 8
1 6
2 6
3 7
4 9
Step 3: Specify and Fit Model
Create a DecisionTreeRegressor
and save it iowa_model. Ensure you've done the relevant import from sklearn to run this command.
Then fit the model you just created using the data in X
and y
that you saved above.
步骤 3:指定并拟合模型
创建一个DecisionTreeRegressor
并将其保存为 iowa_model。 确保您已从 sklearn 完成相关导入然后运行此命令。
然后使用上面保存的X
和y
中的数据拟合您刚刚创建的模型。
from sklearn.tree import DecisionTreeRegressor
# from _ import _
#specify the model.
#For model reproducibility, set a numeric value for random_state when specifying the model
#为了模型的重现性,在指定模型时为random_state设置一个数值
iowa_model = DecisionTreeRegressor(random_state=55)
# Fit the model
iowa_model.fit(X,y)
# Check your answer
step_3.check()
Correct
# step_3.hint()
step_3.solution()
Solution:
from sklearn.tree import DecisionTreeRegressor
iowa_model = DecisionTreeRegressor(random_state=1)
iowa_model.fit(X, y)
Step 4: Make Predictions
Make predictions with the model's predict
command using X
as the data. Save the results to a variable called predictions
.
步骤 4:做出预测
使用模型的predict
命令使用X
作为数据进行预测。 将结果保存到名为predictions
的变量中。
predictions = iowa_model.predict(X)
print(predictions)
# Check your answer
step_4.check()
[208500. 181500. 223500. ... 266500. 142125. 147500.]
Correct
# step_4.hint()
step_4.solution()
Solution:
iowa_model.predict(X)
Think About Your Results
Use the head
method to compare the top few predictions to the actual home values (in y
) for those same homes. Anything surprising?
思考你的结果
使用head
方法将前几个预测与这些相同房屋的实际房屋值(以y
表示)进行比较。 有什么令人惊讶的吗?
# You can write code in this cell
from sklearn.metrics import mean_squared_log_error
mean_squared_log_error(predictions, y)
2.885347464787131e-05
It's natural to ask how accurate the model's predictions will be and how you can improve that. That will be you're next step.
人们很自然地会问模型的预测有多准确以及如何改进它。 那将是你的下一步。
Keep Going
You are ready for Model Validation.
继续前进
您已准备好模型验证。