This notebook is an exercise in the Pandas course. You can reference the tutorial at this link.
Introduction
介绍
Run the following cell to load your data and some utility functions.
运行以下单元格来加载数据和一些实用函数。
import pandas as pd
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
from learntools.core import binder; binder.bind(globals())
from learntools.pandas.renaming_and_combining import *
print("Setup complete.")
Setup complete.
Exercises
练习
View the first several lines of your data by running the cell below:
通过运行下面的单元格查看数据的前几行:
reviews.head()
country | description | designation | points | price | province | region_1 | region_2 | taster_name | taster_twitter_handle | title | variety | winery | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Italy | Aromas include tropical fruit, broom, brimston... | Vulkà Bianco | 87 | NaN | Sicily & Sardinia | Etna | NaN | Kerin O’Keefe | @kerinokeefe | Nicosia 2013 Vulkà Bianco (Etna) | White Blend | Nicosia |
1 | Portugal | This is ripe and fruity, a wine that is smooth... | Avidagos | 87 | 15.0 | Douro | NaN | NaN | Roger Voss | @vossroger | Quinta dos Avidagos 2011 Avidagos Red (Douro) | Portuguese Red | Quinta dos Avidagos |
2 | US | Tart and snappy, the flavors of lime flesh and... | NaN | 87 | 14.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Rainstorm 2013 Pinot Gris (Willamette Valley) | Pinot Gris | Rainstorm |
3 | US | Pineapple rind, lemon pith and orange blossom ... | Reserve Late Harvest | 87 | 13.0 | Michigan | Lake Michigan Shore | NaN | Alexander Peartree | NaN | St. Julian 2013 Reserve Late Harvest Riesling ... | Riesling | St. Julian |
4 | US | Much like the regular bottling from 2012, this... | Vintner's Reserve Wild Child Block | 87 | 65.0 | Oregon | Willamette Valley | Willamette Valley | Paul Gregutt | @paulgwine | Sweet Cheeks 2012 Vintner's Reserve Wild Child... | Pinot Noir | Sweet Cheeks |
1.
region_1
and region_2
are pretty uninformative names for locale columns in the dataset. Create a copy of reviews
with these columns renamed to region
and locale
, respectively.
region_1
和region_2
是数据集中的不能明确表明信息的名称。 创建reviews
的副本,并将这些列分别重命名为region
和locale
。
# Your code here
#renamed = ____
renamed = reviews.rename(columns={'region_1': 'region', 'region_2': 'locale'})
# Check your answer
q1.check()
Correct
#q1.hint()
q1.solution()
Solution:
renamed = reviews.rename(columns=dict(region_1='region', region_2='locale'))
2.
Set the index name in the dataset to wines
.
将数据集中的索引名称设置为wines
。
#reindexed = ____
reindexed = reviews.rename_axis("wines", axis="rows")
# Check your answer
q2.check()
Correct
#q2.hint()
q2.solution()
Solution:
reindexed = reviews.rename_axis('wines', axis='rows')
3.
The Things on Reddit dataset includes product links from a selection of top-ranked forums ("subreddits") on reddit.com. Run the cell below to load a dataframe of products mentioned on the /r/gaming subreddit and another dataframe for products mentioned on the r//movies subreddit.
Reddit 上的事物 数据集包含来自 reddit.com 上精选的顶级论坛(“subreddits”)的产品链接。 运行下面的单元格来加载 /r/gaming subreddit 上提到的产品的DataFrame以及 r//movies subreddit 上提到的产品的另一个DataFrame。
gaming_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/g/gaming.csv")
gaming_products['subreddit'] = "r/gaming"
movie_products = pd.read_csv("../input/things-on-reddit/top-things/top-things/reddits/m/movies.csv")
movie_products['subreddit'] = "r/movies"
Create a DataFrame
of products mentioned on either subreddit.
创建 任一 Reddit 子版块中提到的产品的DataFrame
。
#combined_products = ____
combined_products = pd.concat([gaming_products, movie_products])
# Check your answer
q3.check()
Correct
#q3.hint()
q3.solution()
Solution:
combined_products = pd.concat([gaming_products, movie_products])
4.
The Powerlifting Database dataset on Kaggle includes one CSV table for powerlifting meets and a separate one for powerlifting competitors. Run the cell below to load these datasets into dataframes:
Kaggle 上的 举重数据库 数据集包括一张用于举重比赛的 CSV 表和一张单独的举重参赛者表格。 运行下面的单元格将这些数据集加载到DataFrame中:
powerlifting_meets = pd.read_csv("../input/powerlifting-database/meets.csv")
powerlifting_competitors = pd.read_csv("../input/powerlifting-database/openpowerlifting.csv")
Both tables include references to a MeetID
, a unique key for each meet (competition) included in the database. Using this, generate a dataset combining the two tables into one.
两个表都包含对MeetID
的引用,这是数据库中包含的每次会议(比赛)的唯一键。 使用它,生成一个将两个表合并为一个的数据集。
#powerlifting_combined = ____
powerlifting_combined = powerlifting_meets.join(powerlifting_competitors, on="MeetID", lsuffix="_Meet", rsuffix="_Competitor")
powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))
# Check your answer
q4.check()
Correct
# q4.hint()
q4.solution()
Hint: Use pd.Dataframe.join()
.
Solution:
powerlifting_combined = powerlifting_meets.set_index("MeetID").join(powerlifting_competitors.set_index("MeetID"))
Congratulations!
恭喜!
You've finished the Pandas micro-course. Many data scientists feel efficiency with Pandas is the most useful and practical skill they have, because it allows you to progress quickly in any project you have.
您已经完成了 Pandas 微课程。 许多数据科学家认为 Pandas 的效率是他们拥有的最有用、最实用的技能,因为它可以让你在任何项目中快速取得进展。
If you'd like to apply your new skills to examining geospatial data, you're encouraged to check out our Geospatial Analysis micro-course.
如果您想应用新技能来检查地理空间数据,我们鼓励您查看我们的地理空间分析 微课程。
You can also take advantage of your Pandas skills by entering a Kaggle Competition or by answering a question you find interesting using Kaggle Datasets.
您还可以通过参加 Kaggle 竞赛 或使用 Kaggle 数据集。来提升您的技能。