๐กGCP BigQuery ์ Region๊ฐ ์ด๋ (US to Seoul) ๊ฒฝํ์ ๊ณต์ ํ๋ค.
US Rgeion ์ ์๋ชป ์์ฑํด ์ด์์ค์ธ BigQuery Dataset์ Seoul Region ์ผ๋ก ์ฎ๊ธฐ๋ ค ํ๋ค.
๊ธฐ ์์ฑ๋ Dataset์ Region ์ ๋ณด๋ง ๋ณ๊ฒฝํ๊ณ ์ถ์ง๋ง, ํ๋ฒ ์์ฑํ Dataset์ Region ๋ณ๊ฒฝ์ ๋ถ๊ฐ๋ฅํ๋ค.
์ด๋ฏธ US์ ์์ฑ๋์ด ๋ฐ์ดํฐ๊ฐ ์์ด๊ณ ์๊ธฐ๋๋ฌธ์
๋ค์ Seoul Region์ dataset์ ์์ฑํ๊ณ , ๋ฐ์ดํฐ๋ฅผ ์ฎ๊ฒจ์ผํ๋ค.
๊ทธ๋์ BigQuery Data Transfer Service (์ดํ DTS) ๋ฅผ ์ด์ฉํ์ฌ Region ๋ณ๊ฒฝ์ ์งํํ๋ ค ํ๋ค.
์ ํ์์ ์ผ๋ก BigQuery Admin ๊ถํ์ ๋ถ์ฌํ๋ฉด, DTS ๊ถํ์ด ํฌํจ๋๋ค.
- bigquery.transfers.update
- bigquery.transfers.get
BigQuery Admin ๊ณ์ ์ ๊ฐ์ง SA๋ฅผ ์์ฑํ์ฌ
Google Cloud Console์ uploadํ๋ค.
- GCP ์์ ์ ๊ณตํ๋ UI๋ฅผ ์ด์ฉํ ๋ณต์
GCP ์ฝ์์ BIgQuery ์์ UI ํํ๋ก ์ ๊ณตํ์ง๋ง,
Dataset 1๊ฐ์ฉ ์๋์ผ๋ก job์ ์์ฑํด์ผ ํจ์ผ๋ก, ๋ณต์ ํ dataset ์๊ฐ ๋ง์ ๊ฒฝ์ฐ์๋ python ์ ์ด์ฉํ์ฌ ํ๋ก์ ํธ ์ ์ฒด์ dataset ๋ณต์ ๋ฅผ ์งํํ๋ ํธ์ด ์์ํ๋ค.
- Cloud Shell ์์, Python์ ์ด์ฉํ ๋ณต์
Region์ ๋ณ๊ฒฝํ๋ ค๋ฉด,
- ์๋ณธ์ dataset(US)์ ๋ฐฑ์ ๋ณธ(Seoul)์ ์์ฑํ๋ค.
- ์๋ณธ dataset(US)์ ์ญ์ ํ๋ค.
- ๋ณต์ ๋ dataset(Seoul)์ ์๋ณธdataset(Seoul)์ ์ด๋ฆ์ผ๋ก ๋ณต์ ํ๋ค.
์ด๊ด ์ dataset ์์ procedure, scheduled query, view table, external table, model ๋ค์ ํ์ธํด์ผ ํ๋ค.
DTS๋ฅผ ์ด์ฉํด dataset์ ๋ณต์ฌํ ๋ skip ๋๋ ๋ถ๋ถ๋ค์ด๋ค.
1. routine (procedure)
2. scheduled query
# status 0 : active
# sataus 4 : close
from google.cloud import bigquery_datatransfer
from google.cloud import bigquery
import google.api_core.exceptions
import os
import sys
try:
arg = sys.argv[1]
except IndexError:
print("The project is required.")
sys.exit(1)
if ( arg == "1" ): # project #1
key_path = "/home/whiseung/workspace/keys/project1.json"
elif ( arg == "2" ): # project #2
key_path = "/home/whiseung/workspace/keys/project2.json"
elif ( arg == "3" ): # project #3
key_path = "/home/whiseung/workspace/keys/project3.json"
else :
print("The project does not exist.")
sys.exit(1)
transfer_client = bigquery_datatransfer.DataTransferServiceClient.from_service_account_json(key_path)
client = bigquery.Client.from_service_account_json(key_path)
project = client.project
## us region
parent = transfer_client.common_project_path(project)
## seoul region
# parent = transfer_client.common_project_path(project + "/locations/asia-northeast3")
print(parent)
configs = transfer_client.list_transfer_configs(parent=parent)
print("Configs:")
for config in configs:
if ( config.destination_dataset_id != "" ):
print(f"\tDataset: {config.destination_dataset_id}.{config.display_name}, state: {config.state}, Schedule: {config.schedule}" )
3. model : model ์ list๋ง ์ถ์ถํ๋ code
from google.cloud import bigquery
import datetime
print("Capture Time : " + str(datetime.datetime.now() + datetime.timedelta(hours=9, minutes=0, seconds=0)))
keys = ["/home/whiseung/workspace/keys/project1.json"
,"/home/whiseung/workspace/keys/project2.json"
,"/home/whiseung/workspace/keys/project3.json"]
for key in keys:
client = bigquery.Client.from_service_account_json(key)
datasets = list(client.list_datasets()) # Make an API request.
project = client.project
if datasets:
for dataset in datasets:
dst = dataset.dataset_id
## ๋ฐฑ์
์ด ์๋ dataset๋ง
if ( str(dataset.dataset_id).find('_bak') < 0 and str(dataset.dataset_id).find('_cloocus_backup') < 0 and str(dataset.dataset_id).find('_US') < 0 ):
models = client.list_models(dst) # Make an API request.
for model in models:
full_model_id = "{},{},{}".format(
model.project, model.dataset_id, model.model_id
)
friendly_name = model.friendly_name
print("{}".format(full_model_id))
4. view table : view table ddl๋ฌธ ์ถ์ถํ๋ code
from google.cloud import bigquery
import datetime
import sys
def keySetting():
try:
arg = sys.argv[1]
except IndexError:
print("The project is required.")
sys.exit(1)
if ( arg == "1" ): # project #1
key_path = "/home/whiseung/workspace/keys/project1.json"
elif ( arg == "2" ): # project #2
key_path = "/home/whiseung/workspace/keys/project2.json"
elif ( arg == "3" ): # project #3
key_path = "/home/whiseung/workspace/keys/project3.json"
else :
print("The project does not exist.")
sys.exit(1)
return key_path
key_path = keySetting()
client = bigquery.Client.from_service_account_json(key_path)
datasets = list(client.list_datasets()) # Make an API request.
project = client.project
if datasets:
print("-- Capture Time : " + str(datetime.datetime.now() + datetime.timedelta(hours=9, minutes=0, seconds=0)))
for dataset in datasets:
dst = dataset.dataset_id
## ๋ฐฑ์
์ด ์๋ dataset๋ง
if ( str(dataset.dataset_id).find('_bak') < 0 ):
query_job_not_like = client.query("SELECT table_catalog, table_schema, table_name, table_type, ddl FROM `{}.{}.INFORMATION_SCHEMA.TABLES` WHERE table_type = 'VIEW' ORDER BY table_catalog, table_schema, table_name, table_type ".format(project, dst))
results = query_job_not_like.result()
for row in results:
print("-- {}.{}.{}".format(row[0],row[1],row[2]))
print("{}".format(row[4]))
5. external table : external table ddl ์ถ์ถํ๋ code
from google.cloud import bigquery
import datetime
import sys
def keySetting():
try:
arg = sys.argv[1]
except IndexError:
print("The project is required.")
sys.exit(1)
if ( arg == "1" ): # project #1
key_path = "/home/whiseung/workspace/keys/project1.json"
elif ( arg == "2" ): # project #2
key_path = "/home/whiseung/workspace/keys/project2.json"
elif ( arg == "3" ): # project #3
key_path = "/home/whiseung/workspace/keys/project3.json"
else :
print("The project does not exist.")
sys.exit(1)
return key_path
key_path = keySetting()
client = bigquery.Client.from_service_account_json(key_path)
datasets = list(client.list_datasets()) # Make an API request.
project = client.project
if datasets:
print("-- Capture Time : " + str(datetime.datetime.now() + datetime.timedelta(hours=9, minutes=0, seconds=0)))
for dataset in datasets:
dst = dataset.dataset_id
## ๋ฐฑ์
์ด ์๋ dataset๋ง
if ( str(dataset.dataset_id).find('_bak') < 0 ):
query_job_not_like = client.query("SELECT table_catalog, table_schema, table_name, table_type, ddl FROM `{}.{}.INFORMATION_SCHEMA.TABLES` WHERE table_type = 'EXTERNAL' ORDER BY table_catalog, table_schema, table_name, table_type ".format(project, dst))
results = query_job_not_like.result()
for row in results:
print("-- {}.{}.{}".format(row[0],row[1],row[2]))
print("{}".format(row[4]))
- Python ์ฝ๋ ์ํ
https://whiseung.tistory.com/entry/BigQuery-Data-Transfer-Service
- python ์ฝ๋ ์คํ
https://whiseung.tistory.com/entry/Google-Cloud-Shell-%EA%B0%84%EB%8B%A8-%EC%82%AC%EC%9A%A9%EB%B2%95
'GCP > BigQuery on GCP' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
JAVA ํ๋ก์ ํธ๋ก GCP BigQuery์ INSERT/UPDATE ์ฟผ๋ฆฌ ์คํ ํด๋ณด๊ธฐ (0) | 2023.01.16 |
---|---|
JAVA ํ๋ก์ ํธ๋ก GCP BigQuery์ Select ์ฟผ๋ฆฌ ์คํ ํด๋ณด๊ธฐ (0) | 2023.01.10 |
JAVA ํ๋ก์ ํธ๋ก GCP BigQuery์ Dataset ๊ฐ์ ธ์ค๊ธฐ (0) | 2022.10.28 |
BigQuery Data Transfer Service ์ํ ์ฝ๋ (1) | 2022.09.07 |
BigQuery Data Transfer ๊ณํ (0) | 2022.09.06 |