Flashield's Blog

Just For My Daily Diary

Flashield's Blog

Just For My Daily Diary

02.exercise-select-from-where【练习:SELECT、FROM和WHERE】

This notebook is an exercise in the SQL course. You can reference the tutorial at this link.


Introduction

介绍

Try writing some SELECT statements of your own to explore a large dataset of air pollution measurements.

尝试编写一些自己的 SELECT 语句来探索空气污染测量的大型数据集。

Run the cell below to set up the feedback system.

运行下面的单元格来设置反馈系统。

# Set up feedback system
from learntools.core import binder
binder.bind(globals())
from learntools.sql.ex2 import *
print("Setup Complete")
Using Kaggle's public dataset BigQuery integration.
Setup Complete

The code cell below fetches the global_air_quality table from the openaq dataset. We also preview the first five rows of the table.

下面的代码单元从openaq数据集中获取global_air_quality表。 我们还预览表的前五行。

from google.cloud import bigquery

# Create a "Client" object
client = bigquery.Client()

# Construct a reference to the "openaq" dataset
dataset_ref = client.dataset("openaq", project="bigquery-public-data")

# API request - fetch the dataset
dataset = client.get_dataset(dataset_ref)

# Construct a reference to the "global_air_quality" table
table_ref = dataset_ref.table("global_air_quality")

# API request - fetch the table
table = client.get_table(table_ref)

# Preview the first five lines of the "global_air_quality" table
client.list_rows(table, max_results=5).to_dataframe()
Using Kaggle's public dataset BigQuery integration.
location city country pollutant value timestamp unit source_name latitude longitude averaged_over_in_hours location_geom
0 Borówiec, ul. Drapałka Borówiec PL bc 0.85217 2022-04-28 07:00:00+00:00 µg/m³ GIOS 1.0 52.276794 17.074114 POINT(52.276794 1)
1 Kraków, ul. Bulwarowa Kraków PL bc 0.91284 2022-04-27 23:00:00+00:00 µg/m³ GIOS 1.0 50.069308 20.053492 POINT(50.069308 1)
2 Płock, ul. Reja Płock PL bc 1.41000 2022-03-30 04:00:00+00:00 µg/m³ GIOS 1.0 52.550938 19.709791 POINT(52.550938 1)
3 Elbląg, ul. Bażyńskiego Elbląg PL bc 0.33607 2022-05-03 13:00:00+00:00 µg/m³ GIOS 1.0 54.167847 19.410942 POINT(54.167847 1)
4 Piastów, ul. Pułaskiego Piastów PL bc 0.51000 2022-05-11 05:00:00+00:00 µg/m³ GIOS 1.0 52.191728 20.837489 POINT(52.191728 1)

Exercises

练习

1) Units of measurement

1) 测量单位

Which countries have reported pollution levels in units of "ppm"? In the code cell below, set first_query to an SQL query that pulls the appropriate entries from the country column.

哪些国家以ppm为单位报告污染水平? 在下面的代码单元中,将first_query设置为从country列中提取适当条目的 SQL 查询。

In case it's useful to see an example query, here's some code from the tutorial:

如果查看示例查询有用,请参阅本教程中的一些代码:

query = """
        SELECT city
        FROM `bigquery-public-data.openaq.global_air_quality`
        WHERE country = 'US'
        """
# Query to select countries with units of "ppm"
first_query = """
            SELECT distinct country
            FROM `bigquery-public-data.openaq.global_air_quality`
            WHERE unit = 'ppm'
            """ # Your code goes here

# Set up the query (cancel the query if it would use too much of 
# your quota, with the limit set to 10 GB)
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
first_query_job = client.query(first_query, job_config=safe_config)

# API request - run the query, and return a pandas DataFrame
first_results = first_query_job.to_dataframe()

# View top few rows of results
print(first_results.head())

# Check your answer
q_1.check()
  country
0      AR
1      IL
2      TW
3      CO
4      EC

Correct

For the solution, uncomment the line below.

q_1.solution()

Solution:


first_query = """
              SELECT country
              FROM `bigquery-public-data.openaq.global_air_quality`
              WHERE unit = "ppm"
              """

# Or to get each country just once, you could use
first_query = """
              SELECT DISTINCT country
              FROM `bigquery-public-data.openaq.global_air_quality`
              WHERE unit = "ppm"
              """

2) High air quality

2) 空气质量高

Which pollution levels were reported to be exactly 0?

据报告,哪些污染水平恰好为 0?

  • Set zero_pollution_query to select all columns of the rows where the value column is 0.
  • 设置zero_pollution_query以选择value列为 0 的行的所有列
  • Set zero_pollution_results to a pandas DataFrame containing the query results.
  • zero_pollution_results 设置为包含查询结果的 pandas DataFrame。
# Query to select all columns where pollution levels are exactly 0
zero_pollution_query = """
            SELECT *
            FROM `bigquery-public-data.openaq.global_air_quality`
            WHERE value = 0
            """ # Your code goes here

# Set up the query
safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
query_job = client.query(zero_pollution_query, job_config=safe_config)

# API request - run the query and return a pandas DataFrame
zero_pollution_results = query_job.to_dataframe() # Your code goes here

# print(zero_pollution_results.head())
zero_pollution_results.head()
location city country pollutant value timestamp unit source_name latitude longitude averaged_over_in_hours location_geom
0 Żary, ul. Szymanowskiego 8 Żary PL bc 0.0 2022-05-05 02:00:00+00:00 µg/m³ GIOS 1.0 51.642656 15.127808 POINT(51.642656 1)
1 Starachowice, ul. Złota Starachowice PL bc 0.0 2022-05-08 11:00:00+00:00 µg/m³ GIOS 1.0 51.050611 21.084175 POINT(51.050611 1)
2 Kraków, ul. Bulwarowa Kraków PL bc 0.0 2022-05-07 13:00:00+00:00 µg/m³ GIOS 1.0 50.069308 20.053492 POINT(50.069308 1)
3 Zielonka, Bory Tucholskie Zielonka PL bc 0.0 2022-05-15 11:00:00+00:00 µg/m³ GIOS 1.0 53.662136 17.933986 POINT(53.662136 1)
4 Żagań, ul. Kochanowskiego Żagań PL bc 0.0 2022-05-02 13:00:00+00:00 µg/m³ GIOS 1.0 51.615447 15.301667 POINT(51.615447 1)
# Check your answer
q_2.check()

Correct

For the solution, uncomment the line below.

对于解决方案,请取消注释下面的行。

q_2.solution()

Solution:


zero_pollution_query = """
                       SELECT *
                       FROM `bigquery-public-data.openaq.global_air_quality`
                       WHERE value = 0
                       """

safe_config = bigquery.QueryJobConfig(maximum_bytes_billed=10**10)
query_job = client.query(zero_pollution_query, job_config=safe_config)

zero_pollution_results = query_job.to_dataframe()

That query wasn't too complicated, and it got the data you want. But these SELECT queries don't organizing data in a way that answers the most interesting questions. For that, we'll need the GROUP BY command.

该查询并不太复杂,并且它得到了您想要的数据。 但这些 SELECT 查询不会以回答最有趣问题的方式组织数据。 为此,我们需要 GROUP BY 命令。

If you know how to use groupby() in pandas, this is similar. But BigQuery works quickly with far larger datasets.

如果您知道如何在 pandas 中使用 groupby(),则这是类似的。 但 BigQuery 可以快速处理更大的数据集。

Fortunately, that's next.

幸运的是,接下来就是这样了。

Keep going

继续前进

GROUP BY clauses and their extensions give you the power to pull interesting statistics out of data, rather than receiving it in just its raw format.

GROUP BY 语句及其扩展使您能够从数据中提取有趣的统计数据,而不是接收数据 只是它的原始格式。

02.exercise-select-from-where【练习:SELECT、FROM和WHERE】

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top