Machine Learning Course Home Page
This exercise will test your ability to read a data file and understand statistics about the data.
In later exercises, you will apply techniques to filter the data, build a machine learning model, and iteratively improve your model.
The course examples use data from Melbourne. To ensure you can apply these techniques on your own, you will have to apply them to a new dataset (with house prices from Iowa).
The exercises use a "notebook" coding environment. In case you are unfamiliar with notebooks, we have a 90-second intro video.
此练习将测试您读取数据文件和理解数据统计信息的能力。
在后面的练习中,您将应用技术来过滤数据、构建机器学习模型并迭代改进模型。
课程示例使用来自墨尔本的数据。 为了确保您可以自己应用这些技术,您必须将它们应用到新的数据集(包含爱荷华州的房价)。
练习使用“笔记本”编码环境。 如果您不熟悉“笔记本”,我们有一个 90 秒的介绍视频。
Exercises
练习
Run the following cell to set up code-checking, which will verify your work as you go.
运行以下单元格来设置代码检查,这将随时验证您的工作。
# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex2 import *
print("Setup Complete")
Setup Complete
Step 1: Loading Data
Read the Iowa data file into a Pandas DataFrame called home_data
.
第 1 步:加载数据
将爱荷华州数据文件读入名为home_data
的 Pandas DataFrame 中。
import pandas as pd
# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
# Fill in the line below to read the file into a variable home_data
home_data = pd.read_csv(iowa_file_path)
# Call line below with no argument to check that you've loaded the data correctly
step_1.check()
Correct
# Lines below will give you a hint or solution code
#step_1.hint()
step_1.solution()
Solution:
home_data = pd.read_csv(iowa_file_path)
Step 2: Review The Data
Use the command you learned to view summary statistics of the data. Then fill in variables to answer the following questions
第 2 步:查看数据
使用您学到的命令查看数据的摘要统计信息。 然后填写变量回答以下问题
# Print summary statistics in next line
home_data.info()
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 1460 non-null int64
1 MSSubClass 1460 non-null int64
2 MSZoning 1460 non-null object
3 LotFrontage 1201 non-null float64
4 LotArea 1460 non-null int64
5 Street 1460 non-null object
6 Alley 91 non-null object
7 LotShape 1460 non-null object
8 LandContour 1460 non-null object
9 Utilities 1460 non-null object
10 LotConfig 1460 non-null object
11 LandSlope 1460 non-null object
12 Neighborhood 1460 non-null object
13 Condition1 1460 non-null object
14 Condition2 1460 non-null object
15 BldgType 1460 non-null object
16 HouseStyle 1460 non-null object
17 OverallQual 1460 non-null int64
18 OverallCond 1460 non-null int64
19 YearBuilt 1460 non-null int64
20 YearRemodAdd 1460 non-null int64
21 RoofStyle 1460 non-null object
22 RoofMatl 1460 non-null object
23 Exterior1st 1460 non-null object
24 Exterior2nd 1460 non-null object
25 MasVnrType 588 non-null object
26 MasVnrArea 1452 non-null float64
27 ExterQual 1460 non-null object
28 ExterCond 1460 non-null object
29 Foundation 1460 non-null object
30 BsmtQual 1423 non-null object
31 BsmtCond 1423 non-null object
32 BsmtExposure 1422 non-null object
33 BsmtFinType1 1423 non-null object
34 BsmtFinSF1 1460 non-null int64
35 BsmtFinType2 1422 non-null object
36 BsmtFinSF2 1460 non-null int64
37 BsmtUnfSF 1460 non-null int64
38 TotalBsmtSF 1460 non-null int64
39 Heating 1460 non-null object
40 HeatingQC 1460 non-null object
41 CentralAir 1460 non-null object
42 Electrical 1459 non-null object
43 1stFlrSF 1460 non-null int64
44 2ndFlrSF 1460 non-null int64
45 LowQualFinSF 1460 non-null int64
46 GrLivArea 1460 non-null int64
47 BsmtFullBath 1460 non-null int64
48 BsmtHalfBath 1460 non-null int64
49 FullBath 1460 non-null int64
50 HalfBath 1460 non-null int64
51 BedroomAbvGr 1460 non-null int64
52 KitchenAbvGr 1460 non-null int64
53 KitchenQual 1460 non-null object
54 TotRmsAbvGrd 1460 non-null int64
55 Functional 1460 non-null object
56 Fireplaces 1460 non-null int64
57 FireplaceQu 770 non-null object
58 GarageType 1379 non-null object
59 GarageYrBlt 1379 non-null float64
60 GarageFinish 1379 non-null object
61 GarageCars 1460 non-null int64
62 GarageArea 1460 non-null int64
63 GarageQual 1379 non-null object
64 GarageCond 1379 non-null object
65 PavedDrive 1460 non-null object
66 WoodDeckSF 1460 non-null int64
67 OpenPorchSF 1460 non-null int64
68 EnclosedPorch 1460 non-null int64
69 3SsnPorch 1460 non-null int64
70 ScreenPorch 1460 non-null int64
71 PoolArea 1460 non-null int64
72 PoolQC 7 non-null object
73 Fence 281 non-null object
74 MiscFeature 54 non-null object
75 MiscVal 1460 non-null int64
76 MoSold 1460 non-null int64
77 YrSold 1460 non-null int64
78 SaleType 1460 non-null object
79 SaleCondition 1460 non-null object
80 SalePrice 1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
import time
# What is the average lot size (rounded to nearest integer)?
# 平均手数是多少(四舍五入到最接近的整数)?
avg_lot_size = round(home_data['LotArea'].mean())
# As of today, how old is the newest home (current year - the date in which it was built)
# 截至今天,最新房屋的年龄有多大(当年 - 建造日期)
newest_home_age = (time.localtime().tm_year - home_data['YearBuilt']).min()
print(avg_lot_size, newest_home_age)
# Checks your answers
step_2.check()
10517 14
Correct
#step_2.hint()
step_2.solution()
Solution:
# using data read from home_data.describe()
avg_lot_size = 10517
newest_home_age = 14
Think About Your Data
The newest house in your data isn't that new. A few potential explanations for this:
- They haven't built new houses where this data was collected.
- The data was collected a long time ago. Houses built after the data publication wouldn't show up.
If the reason is explanation #1 above, does that affect your trust in the model you build with this data? What about if it is reason #2?
How could you dig into the data to see which explanation is more plausible?
Check out this discussion thread to see what others think or to add your ideas.
考虑您的数据
您数据中最新的房屋并不是那么新。 对此的一些可能的解释:
- 他们还没有在收集这些数据的地方建造新房子。
- 数据是很久以前收集的。 数据发布后建造的房屋不会出现。
如果原因是上面的解释#1,这是否会影响您对使用此数据构建的模型的信任? 如果这是原因#2呢?
您如何深入研究数据以查看哪种解释更合理?
查看此讨论主题,了解其他人的想法或添加您的想法。
Keep Going
You are ready for Your First Machine Learning Model.
继续前进
您已准备好 您的第一个机器学习模型。
Machine Learning Course Home Page