This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.
Recap
回顾
Here's the code you've written so far.
这是您到目前为止编写的代码。
# Code you have previously used to load data
import pandas as pd
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
# Path of the file to read
iowa_file_path = '../input/home-data-for-ml-course/train.csv'
home_data = pd.read_csv(iowa_file_path)
# Create target object and call it y
# 创建目标并命名为y
y = home_data.SalePrice
# Create X
# 创建特征X
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[features]
# Split into validation and training data
# 分割为验证数据和训练数据
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
# Specify Model
# 选择模型
iowa_model = DecisionTreeRegressor(random_state=1)
# Fit Model
# 拟合模型
iowa_model.fit(train_X, train_y)
# Make validation predictions and calculate mean absolute error
# 进行验证预测并计算MAE(平均绝对误差)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))
# Using best value for max_leaf_nodes
# 使用最好的 max_leaf_nodes
iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)
iowa_model.fit(train_X, train_y)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))
# Set up code checking
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex6 import *
print("\nSetup complete")
Validation MAE when not specifying max_leaf_nodes: 29,653
Validation MAE for best value of max_leaf_nodes: 27,283
Setup complete
Exercises
练习
Data science isn't always this easy. But replacing the decision tree with a Random Forest is going to be an easy win.
数据科学并不总是那么容易。 但是用随机森林替换决策树将有一个轻松的提升。
Step 1: Use a Random Forest
步骤 1:使用随机森林
from sklearn.ensemble import RandomForestRegressor
# Define the model. Set random_state to 1
# 定义模型,设置 random_state 为1
rf_model = RandomForestRegressor(random_state=1)
# fit your model
# 拟合模型
rf_model.fit(train_X, train_y)
# Calculate the mean absolute error of your Random Forest model on the validation data
# 在验证数据上使用随机森林模型来计算平均绝对误差
rf_val_mae = mean_absolute_error(rf_model.predict(val_X), val_y)
print("Validation MAE for Random Forest Model: {}".format(rf_val_mae))
# Check your answer
step_1.check()
Validation MAE for Random Forest Model: 21857.15912981083
Correct
# The lines below will show you a hint or the solution.
# step_1.hint()
step_1.solution()
Solution:
rf_model = RandomForestRegressor()
# fit your model
rf_model.fit(train_X, train_y)
# Calculate the mean absolute error of your Random Forest model on the validation data
rf_val_predictions = rf_model.predict(val_X)
rf_val_mae = mean_absolute_error(rf_val_predictions, val_y)
So far, you have followed specific instructions at each step of your project. This helped learn key ideas and build your first model, but now you know enough to try things on your own.
到目前为止,您已在项目的每个步骤中遵循了具体说明。 这有助于学习关键思想并构建您的第一个模型,但现在您已经知道足够的知识来自己尝试了。
Machine Learning competitions are a great way to try your own ideas and learn more as you independently navigate a machine learning project.
机器学习竞赛是您在独立探索机器学习项目时尝试自己的想法并了解更多信息的好方法。
Keep Going
继续前进
You are ready for Machine Learning Competitions.
您已准备好参加 机器学习竞赛。