GCP/BigQuery on GCP

BigQuery Region ์ด๊ด€(๋ณ€๊ฒฝ) ํ”„๋กœ์ ํŠธ

whistory 2022. 9. 6. 08:46
๋ฐ˜์‘ํ˜•

 

๐Ÿ’กGCP BigQuery ์˜ Region๊ฐ„ ์ด๋™ (US to Seoul) ๊ฒฝํ—˜์„ ๊ณต์œ ํ•œ๋‹ค.

 

 

 

US Rgeion ์— ์ž˜๋ชป ์ƒ์„ฑํ•ด ์šด์˜์ค‘์ธ BigQuery Dataset์„ Seoul Region ์œผ๋กœ ์˜ฎ๊ธฐ๋ ค ํ•œ๋‹ค.

 

๊ธฐ ์ƒ์„ฑ๋œ Dataset์˜ Region ์ •๋ณด๋งŒ ๋ณ€๊ฒฝํ•˜๊ณ  ์‹ถ์ง€๋งŒ, ํ•œ๋ฒˆ ์ƒ์„ฑํ•œ Dataset์˜ Region ๋ณ€๊ฒฝ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.

 

์ด๋ฏธ US์— ์ƒ์„ฑ๋˜์–ด ๋ฐ์ดํ„ฐ๊ฐ€ ์Œ“์ด๊ณ  ์žˆ๊ธฐ๋•Œ๋ฌธ์—

๋‹ค์‹œ Seoul Region์— dataset์„ ์ƒ์„ฑํ•˜๊ณ , ๋ฐ์ดํ„ฐ๋ฅผ ์˜ฎ๊ฒจ์•ผํ•œ๋‹ค.

 

 

๊ทธ๋ž˜์„œ BigQuery Data Transfer Service (์ดํ•˜ DTS) ๋ฅผ ์ด์šฉํ•˜์—ฌ Region ๋ณ€๊ฒฝ์„ ์ง„ํ–‰ํ•˜๋ ค ํ•œ๋‹ค.

์„ ํ–‰์ž‘์—…์œผ๋กœ BigQuery Admin ๊ถŒํ•œ์„ ๋ถ€์—ฌํ•˜๋ฉด, DTS ๊ถŒํ•œ์ด ํฌํ•จ๋œ๋‹ค.

  • bigquery.transfers.update
  • bigquery.transfers.get

BigQuery Admin ๊ณ„์ •์„ ๊ฐ€์ง„ SA๋ฅผ ์ƒ์„ฑํ•˜์—ฌ

Google Cloud Console์— uploadํ•œ๋‹ค.

 

 

- GCP ์—์„œ ์ œ๊ณตํ•˜๋Š” UI๋ฅผ ์ด์šฉํ•œ ๋ณต์ œ

GCP ์ฝ˜์†”์˜ BIgQuery ์—์„œ UI ํ˜•ํƒœ๋กœ ์ œ๊ณตํ•˜์ง€๋งŒ,

Dataset 1๊ฐœ์”ฉ ์ˆ˜๋™์œผ๋กœ job์„ ์ƒ์„ฑํ•ด์•ผ ํ•จ์œผ๋กœ, ๋ณต์ œํ•  dataset ์ˆ˜๊ฐ€ ๋งŽ์€ ๊ฒฝ์šฐ์—๋Š” python ์„ ์ด์šฉํ•˜์—ฌ ํ”„๋กœ์ ํŠธ ์ „์ฒด์˜ dataset ๋ณต์ œ๋ฅผ ์ง„ํ–‰ํ•˜๋Š” ํŽธ์ด ์ˆ˜์›”ํ•˜๋‹ค.

 

 

 

 

 

 

 

 

- Cloud Shell ์—์„œ, Python์„ ์ด์šฉํ•œ ๋ณต์ œ

Region์„ ๋ณ€๊ฒฝํ•˜๋ ค๋ฉด,

  1. ์›๋ณธ์˜ dataset(US)์˜ ๋ฐฑ์—…๋ณธ(Seoul)์„ ์ƒ์„ฑํ•œ๋‹ค.
  2. ์›๋ณธ dataset(US)์„ ์‚ญ์ œํ•œ๋‹ค.
  3. ๋ณต์ œ๋œ dataset(Seoul)์„ ์›๋ณธdataset(Seoul)์˜ ์ด๋ฆ„์œผ๋กœ ๋ณต์ œํ•œ๋‹ค.

 

์ด๊ด€ ์‹œ dataset ์—์„œ procedure, scheduled query, view table, external table, model ๋“ค์„ ํ™•์ธํ•ด์•ผ ํ•œ๋‹ค.

 

DTS๋ฅผ ์ด์šฉํ•ด dataset์„ ๋ณต์‚ฌํ• ๋•Œ skip ๋˜๋Š” ๋ถ€๋ถ„๋“ค์ด๋‹ค.

 

1. routine (procedure)

2. scheduled query

# status 0 : active
# sataus 4 : close

from google.cloud import bigquery_datatransfer
from google.cloud import bigquery
import google.api_core.exceptions
import os
import sys

try:
    arg = sys.argv[1]
except IndexError:
    print("The project is required.")
    sys.exit(1)

if ( arg == "1" ):      # project #1
    key_path = "/home/whiseung/workspace/keys/project1.json"
elif  ( arg == "2" ):   # project #2
    key_path = "/home/whiseung/workspace/keys/project2.json"
elif  ( arg == "3" ):   # project #3
    key_path = "/home/whiseung/workspace/keys/project3.json"
else :
    print("The project does not exist.")
    sys.exit(1)

transfer_client = bigquery_datatransfer.DataTransferServiceClient.from_service_account_json(key_path)
client = bigquery.Client.from_service_account_json(key_path)
project = client.project

## us region
parent = transfer_client.common_project_path(project)
## seoul region
# parent = transfer_client.common_project_path(project + "/locations/asia-northeast3")


print(parent)
configs = transfer_client.list_transfer_configs(parent=parent)

print("Configs:")
for config in configs:
    if ( config.destination_dataset_id != "" ):
        print(f"\tDataset: {config.destination_dataset_id}.{config.display_name}, state: {config.state}, Schedule: {config.schedule}" )

3. model : model ์€ list๋งŒ ์ถ”์ถœํ•˜๋Š” code

from google.cloud import bigquery
import datetime

print("Capture Time : " + str(datetime.datetime.now() + datetime.timedelta(hours=9, minutes=0, seconds=0)))

keys = ["/home/whiseung/workspace/keys/project1.json"
        ,"/home/whiseung/workspace/keys/project2.json"
        ,"/home/whiseung/workspace/keys/project3.json"]

for key in keys:
    client = bigquery.Client.from_service_account_json(key)
    datasets = list(client.list_datasets())  # Make an API request.
    project = client.project

    if datasets:
        for dataset in datasets:
            dst = dataset.dataset_id
            
            ## ๋ฐฑ์—…์ด ์•„๋‹Œ dataset๋งŒ
            if ( str(dataset.dataset_id).find('_bak') < 0 and str(dataset.dataset_id).find('_cloocus_backup') < 0 and str(dataset.dataset_id).find('_US') < 0 ):
                models = client.list_models(dst)  # Make an API request.

                for model in models:
                    full_model_id = "{},{},{}".format(
                        model.project, model.dataset_id, model.model_id
                    )
                    friendly_name = model.friendly_name
                    print("{}".format(full_model_id))

4. view table : view table ddl๋ฌธ ์ถ”์ถœํ•˜๋Š” code

from google.cloud import bigquery
import datetime
import sys

def keySetting():
    try:
        arg = sys.argv[1]
    except IndexError:
        print("The project is required.")
        sys.exit(1)

   if ( arg == "1" ):      # project #1
        key_path = "/home/whiseung/workspace/keys/project1.json"
    elif  ( arg == "2" ):   # project #2
        key_path = "/home/whiseung/workspace/keys/project2.json"
    elif  ( arg == "3" ):   # project #3
        key_path = "/home/whiseung/workspace/keys/project3.json"
    else :
    	print("The project does not exist.")
    	sys.exit(1)

    return key_path


key_path = keySetting()

client = bigquery.Client.from_service_account_json(key_path)
datasets = list(client.list_datasets())  # Make an API request.
project = client.project

if datasets:
    print("-- Capture Time : " + str(datetime.datetime.now() + datetime.timedelta(hours=9, minutes=0, seconds=0)))
    for dataset in datasets:
        dst = dataset.dataset_id
        ## ๋ฐฑ์—…์ด ์•„๋‹Œ dataset๋งŒ
        if ( str(dataset.dataset_id).find('_bak') < 0 ):
            query_job_not_like = client.query("SELECT table_catalog, table_schema, table_name, table_type, ddl FROM `{}.{}.INFORMATION_SCHEMA.TABLES`  WHERE table_type = 'VIEW' ORDER BY  table_catalog, table_schema, table_name, table_type ".format(project, dst))
            results = query_job_not_like.result()
            for row in results:
                print("-- {}.{}.{}".format(row[0],row[1],row[2]))
                print("{}".format(row[4]))

5. external table : external table ddl ์ถ”์ถœํ•˜๋Š” code

from google.cloud import bigquery
import datetime
import sys

def keySetting():
    try:
        arg = sys.argv[1]
    except IndexError:
        print("The project is required.")
        sys.exit(1)

    if ( arg == "1" ):      # project #1
        key_path = "/home/whiseung/workspace/keys/project1.json"
    elif  ( arg == "2" ):   # project #2
        key_path = "/home/whiseung/workspace/keys/project2.json"
    elif  ( arg == "3" ):   # project #3
        key_path = "/home/whiseung/workspace/keys/project3.json"
    else :
        print("The project does not exist.")
        sys.exit(1)

    return key_path

key_path = keySetting()

client = bigquery.Client.from_service_account_json(key_path)
datasets = list(client.list_datasets())  # Make an API request.
project = client.project


if datasets:
    print("-- Capture Time : " + str(datetime.datetime.now() + datetime.timedelta(hours=9, minutes=0, seconds=0)))
    for dataset in datasets:
        dst = dataset.dataset_id
        ## ๋ฐฑ์—…์ด ์•„๋‹Œ dataset๋งŒ
        if ( str(dataset.dataset_id).find('_bak') < 0 ):
            query_job_not_like = client.query("SELECT table_catalog, table_schema, table_name, table_type, ddl FROM `{}.{}.INFORMATION_SCHEMA.TABLES`  WHERE table_type = 'EXTERNAL' ORDER BY  table_catalog, table_schema, table_name, table_type ".format(project, dst))
            results = query_job_not_like.result()
            for row in results:
                print("-- {}.{}.{}".format(row[0],row[1],row[2]))
                print("{}".format(row[4]))

 

 

- Python ์ฝ”๋“œ ์ƒ˜ํ”Œ

https://whiseung.tistory.com/entry/BigQuery-Data-Transfer-Service

 

BigQuery Data Transfer Service ์ƒ˜ํ”Œ ์ฝ”๋“œ

๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ณต์‚ฌ | BigQuery Data Transfer Service | Google Cloud ๋ฐ์ดํ„ฐ ์„ธํŠธ ๋ณต์‚ฌ | BigQuery Data Transfer Service | Google Cloud ์ „์†ก ๊ตฌ์„ฑ์„ ๋งŒ๋“ค์–ด ํ”„๋กœ์ ํŠธ, ์œ„์น˜ ๋˜๋Š” ๋‘˜ ๋‹ค์—์„œ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ๋ชจ๋“  ํ…Œ์ด๋ธ”์„ ๋ณต

whiseung.tistory.com

- python ์ฝ”๋“œ ์‹คํ–‰

https://whiseung.tistory.com/entry/Google-Cloud-Shell-%EA%B0%84%EB%8B%A8-%EC%82%AC%EC%9A%A9%EB%B2%95

 

Google Cloud Shell ๊ฐ„๋‹จ ์‚ฌ์šฉ๋ฒ•

BigQuery ์ด๊ด€์„ ์œ„ํ•ด, Python ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ•ด์•ผ ํ–ˆ๋‹ค. ์ฝ˜์†”์˜ ์šฐ์ธก์ƒ๋‹จ์˜ ๋ฒ„ํŠผ์„ ํ†ตํ•ด ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๊ณ , ์„œ๋น„์Šค ๊ฒ€์ƒ‰์„ ํ†ตํ•ด ์ ‘๊ทผํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. VScode๋ฅผ ์‚ฌ์šฉํ•ด ๋ดค์œผ๋ฉด, ์ต์ˆ™ํ•œ ํ™”๋ฉด์ผ ๊ฒƒ์ด

whiseung.tistory.com

 

๋ฐ˜์‘ํ˜•