Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

06.exercise-renaming-and-combining【练习:重命名及组合】

This notebook is an exercise in the Pandas course. You can reference the tutorial at this link.


Introduction

介绍

Run the following cell to load your data and some utility functions.

运行以下单元格来加载数据和一些实用函数。

import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.renaming_and_combining import *
print("Setup complete.")
Setup complete.

Exercises

练习

View the first several lines of your data by running the cell below:

通过运行下面的单元格查看数据的前几行:

reviews.head()
country description designation points price province region_1 region_2 taster_name taster_twitter_handle title variety winery
0 Italy Aromas include tropical fruit, broom, brimston... Vulkà Bianco 87 NaN Sicily & Sardinia Etna NaN Kerin O’Keefe @kerinokeefe Nicosia 2013 Vulkà Bianco (Etna) White Blend Nicosia
1 Portugal This is ripe and fruity, a wine that is smooth... Avidagos 87 15.0 Douro NaN NaN Roger Voss @vossroger Quinta dos Avidagos 2011 Avidagos Red (Douro) Portuguese Red Quinta dos Avidagos
2 US Tart and snappy, the flavors of lime flesh and... NaN 87 14.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Rainstorm 2013 Pinot Gris (Willamette Valley) Pinot Gris Rainstorm
3 US Pineapple rind, lemon pith and orange blossom ... Reserve Late Harvest 87 13.0 Michigan Lake Michigan Shore NaN Alexander Peartree NaN St. Julian 2013 Reserve Late Harvest Riesling ... Riesling St. Julian
4 US Much like the regular bottling from 2012, this... Vintner's Reserve Wild Child Block 87 65.0 Oregon Willamette Valley Willamette Valley Paul Gregutt @paulgwine Sweet Cheeks 2012 Vintner's Reserve Wild Child... Pinot Noir Sweet Cheeks

1.

region_1 and region_2 are pretty uninformative names for locale columns in the dataset. Create a copy of reviews with these columns renamed to region and locale, respectively.

region_1region_2是数据集中的不能明确表明信息的名称。 创建reviews的副本,并将这些列分别重命名为regionlocale

# Your code here
#renamed = ____

renamed = reviews.rename(columns={'region_1': 'region', 'region_2': 'locale'})

# Check your answer
q1.check()

Correct

#q1.hint()
q1.solution()

Solution:

renamed = reviews.rename(columns=dict(region_1='region', region_2='locale'))

2.

Set the index name in the dataset to wines.

将数据集中的索引名称设置为wines

#reindexed = ____

reindexed = reviews.rename_axis("wines", axis="rows")
# Check your answer
q2.check()

Correct

#q2.hint()
q2.solution()

Solution:

reindexed = reviews.rename_axis('wines', axis='rows')

3.

The Things on Reddit dataset includes product links from a selection of top-ranked forums ("subreddits") on reddit.com. Run the cell below to load a dataframe of products mentioned on the /r/gaming subreddit and another dataframe for products mentioned on the r//movies subreddit.

Reddit 上的事物 数据集包含来自 reddit.com 上精选的顶级论坛(“subreddits”)的产品链接。 运行下面的单元格来加载 /r/gaming subreddit 上提到的产品的DataFrame以及 r//movies subreddit 上提到的产品的另一个DataFrame。

gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products['subreddit'] = "r/gaming"
movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products['subreddit'] = "r/movies"

Create a DataFrame of products mentioned on either subreddit.

创建 任一 Reddit 子版块中提到的产品的DataFrame

#combined_products = ____

combined_products = pd.concat([gaming_products, movie_products])
# Check your answer
q3.check()

Correct

#q3.hint()
q3.solution()

Solution:

combined_products = pd.concat([gaming_products, movie_products])

4.

The Powerlifting Database dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. Run the cell below to load these datasets into dataframes:

Kaggle 上的 举重数据库 数据集包括一张用于举重比赛的 CSV 表和一张单独的举重参赛者表格。 运行下面的单元格将这些数据集加载到DataFrame中:

powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")

Both tables include references to a MeetID, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.

两个表都包含对MeetID的引用,这是数据库中包含的每次会议(比赛)的唯一键。 使用它,生成一个将两个表合并为一个的数据集。

#powerlifting_combined = ____
powerlifting_combined = powerlifting_meets.join(powerlifting_competitors, on="MeetID", lsuffix="_Meet", rsuffix="_Competitor")
powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))
# Check your answer
q4.check()

Correct

# q4.hint()
q4.solution()

Hint: Use pd.Dataframe.join().

Solution:

powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))

Congratulations!

恭喜!

You've finished the Pandas micro-course. Many data scientists feel efficiency with Pandas is the most useful and practical skill they have, because it allows you to progress quickly in any project you have.

您已经完成了 Pandas 微课程。 许多数据科学家认为 Pandas 的效率是他们拥有的最有用、最实用的技能,因为它可以让你在任何项目中快速取得进展。

If you'd like to apply your new skills to examining geospatial data, you're encouraged to check out our Geospatial Analysis micro-course.

如果您想应用新技能来检查地理空间数据,我们鼓励您查看我们的地理空间分析 微课程。

You can also take advantage of your Pandas skills by entering a Kaggle Competition or by answering a question you find interesting using Kaggle Datasets.

您还可以通过参加 Kaggle 竞赛 或使用 Kaggle 数据集。来提升您的技能。

06.exercise-renaming-and-combining【练习:重命名及组合】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top