Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

07.exercise-machine-learning-competitions【练习:机器学习竞赛】

This notebook is an exercise in the Introduction to Machine Learning course. You can reference the tutorial at this link.


Introduction

介绍

In this exercise, you will create and submit predictions for a Kaggle competition. You can then improve your model (e.g. by adding features) to improve and see how you stack up to others taking this course.

在本练习中,您将创建并提交 Kaggle 竞赛的预测。 然后,您可以改进您的模型(例如通过添加特征)来改进并了解您与参加本课程的其他人相比如何。

The steps in this notebook are:

本笔记本中的步骤是:

  1. Build a Random Forest model with all of your data (X and y).
  2. 使用所有数据(Xy)构建随机森林模型。
  3. Read in the "test" data, which doesn't include values for the target. Predict home values in the test data with your Random Forest model.
  4. 读入“测试”数据,其中不包括目标值。 使用随机森林模型预测测试数据中的房屋价值。
  5. Submit those predictions to the competition and see your score.
  6. 将这些预测提交给竞赛并查看您的得分。
  7. Optionally, come back to see if you can improve your model by adding features or changing your model. Then you can resubmit to see how that stacks up on the competition leaderboard.
  8. 或者,返回查看是否可以通过添加特征或更改模型来改进模型。 然后您可以重新提交以查看其在竞赛排行榜上的排名。

Recap

回顾

Here's the code you've written so far. Start by running it again.

这是您到目前为止编写的代码。 首先再次运行它。

# Code you have previously used to load data
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor

# Set up code checking
import os
if not os.path.exists("../input/train.csv"):
    os.symlink("../input/home-data-for-ml-course/train.csv", "../input/train.csv")  
    os.symlink("../input/home-data-for-ml-course/test.csv", "../input/test.csv") 
from learntools.core import binder
binder.bind(globals())
from learntools.machine_learning.ex7 import *

# Path of the file to read. We changed the directory structure to simplify submitting to a competition
# 要读取的文件的路径。 我们更改了目录结构以简化提交比赛的过程
iowa_file_path = '../input/train.csv'

home_data = pd.read_csv(iowa_file_path)
# Create target object and call it y
y = home_data.SalePrice
# Create X
features = ['LotArea', 'YearBuilt', '1stFlrSF', '2ndFlrSF', 'FullBath', 'BedroomAbvGr', 'TotRmsAbvGrd']
X = home_data[features]

# Split into validation and training data
# 分为验证数据和训练数据
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)

# Specify Model
# 指定模型
iowa_model = DecisionTreeRegressor(random_state=1)
# Fit Model
# 拟合模型
iowa_model.fit(train_X, train_y)

# Make validation predictions and calculate mean absolute error
# 进行验证预测并计算平均绝对误差
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE when not specifying max_leaf_nodes: {:,.0f}".format(val_mae))

# Using best value for max_leaf_nodes
# 使用 max_leaf_nodes 的最佳值
iowa_model = DecisionTreeRegressor(max_leaf_nodes=100, random_state=1)
iowa_model.fit(train_X, train_y)
val_predictions = iowa_model.predict(val_X)
val_mae = mean_absolute_error(val_predictions, val_y)
print("Validation MAE for best value of max_leaf_nodes: {:,.0f}".format(val_mae))

# Define the model. Set random_state to 1
# 定义模型。 将 random_state 设置为 1
rf_model = RandomForestRegressor(random_state=1)
rf_model.fit(train_X, train_y)
rf_val_predictions = rf_model.predict(val_X)
rf_val_mae = mean_absolute_error(rf_val_predictions, val_y)

print("Validation MAE for Random Forest Model: {:,.0f}".format(rf_val_mae))
Validation MAE when not specifying max_leaf_nodes: 29,653
Validation MAE for best value of max_leaf_nodes: 27,283
Validation MAE for Random Forest Model: 21,857

Creating a Model For the Competition

为比赛创建模型

Build a Random Forest model and train it on all of X and y.

构建随机森林模型并在所有 Xy 上对其进行训练。

# To improve accuracy, create a new Random Forest model which you will train on all training data
# 为了提高准确性,创建一个新的随机森林模型,您将在所有训练数据上对其进行训练
rf_model_on_full_data = RandomForestRegressor(random_state=0)

# fit rf_model_on_full_data on all data from the training data
# 将 rf_model_on_full_data 拟合到训练数据中的所有数据上
rf_model_on_full_data.fit(X, y)
07.exercise-machine-learning-competitions【练习:机器学习竞赛】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top