반응형
VSCode로 개발하기.
Azure Blob Storage에 접근해, Computer Vision으로 데이터 추출하기
1. VM - 기본설정
### ModuleNotFoundError: No module named 'azure'
PS C:\\workspace\\project> **pip install azure**
Collecting azure
Downloading azure-5.0.0.zip (4.6 kB)
Downloading azure-4.0.0-py2.py3-none-any.whl (2.2 kB)
Collecting azure-mgmt~=4.0
Downloading azure_mgmt-4.0.0-py2.py3-none-any.whl (3.0 kB)
Collecting azure-eventgrid~=1.1
Downloading azure_eventgrid-1.3.0-py2.py3-none-any.whl (167 kB)
|████████████████████████████████| 167 kB 2.2 MB/s
...
Installing collected packages:
oauthlib, requests-oauthlib, PyJWT, isodate, azure-nspkg, msrest, azure-mgmt-nspkg, adal, msrestazure, azure-mgmt-datalake-nspkg, azure-common, azure-storage-common, azure-mgmt-web, azure-mgmt-trafficmanager, azure-mgmt-subscription, azure-mgmt-storage, azure-mgmt-sql, azure-mgmt-signalr, azure-mgmt-servicefabric, azure-mgmt-servicebus, azure-mgmt-search, azure-mgmt-scheduler, azure-mgmt-resource, azure-mgmt-reservations, azure-mgmt-relay, azure-mgmt-redis, azure-mgmt-recoveryservicesbackup, azure-mgmt-recoveryservices, azure-mgmt-rdbms, azure-mgmt-powerbiembedded, azure-mgmt-policyinsights, azure-mgmt-notificationhubs, azure-mgmt-network, azure-mgmt-msi, azure-mgmt-monitor, azure-mgmt-media, azure-mgmt-marketplaceordering, azure-mgmt-maps, azure-mgmt-managementpartner, azure-mgmt-managementgroups, azure-mgmt-machinelearningcompute, azure-mgmt-logic, azure-mgmt-loganalytics, azure-mgmt-keyvault, azure-mgmt-iothubprovisioningservices, azure-mgmt-iothub, azure-mgmt-iotcentral, azure-mgmt-hanaonazure, azure-mgmt-eventhub, azure-mgmt-eventgrid, azure-mgmt-dns, azure-mgmt-devtestlabs, azure-mgmt-devspaces, azure-mgmt-datamigration, azure-mgmt-datalake-store, azure-mgmt-datalake-analytics, azure-mgmt-datafactory, azure-mgmt-cosmosdb, azure-mgmt-containerservice, azure-mgmt-containerregistry, azure-mgmt-containerinstance, azure-mgmt-consumption, azure-mgmt-compute, azure-mgmt-commerce, azure-mgmt-cognitiveservices, azure-mgmt-cdn, azure-mgmt-billing, azure-mgmt-batchai, azure-mgmt-batch, azure-mgmt-authorization, azure-mgmt-applicationinsights, azure-mgmt-advisor, azure-cosmosdb-nspkg, azure-storage-queue, azure-storage-file, azure-storage-blob, azure-servicemanagement-legacy, azure-servicefabric, azure-servicebus, azure-mgmt, azure-loganalytics, azure-keyvault, azure-graphrbac, azure-eventgrid, azure-datalake-store, azure-cosmosdb-table, azure-batch, azure-applicationinsights, azure
WARNING: You are using pip version 21.1.2; however, version 21.3 is available.
You should consider upgrading via the 'c:\\miniconda\\python.exe -m pip install --upgrade pip' command.
### ModuleNotFoundError: No module named 'azure.cognitiveservices'
PS C:\\workspace\\dmil> **pip install azure-cognitiveservices-vision-computervision**
Collecting azure-cognitiveservices-vision-computervision
Downloading azure_cognitiveservices_vision_computervision-0.9.0-py2.py3-none-any.whl (39 kB)
...
Installing collected packages: azure-cognitiveservices-vision-computervision
Successfully installed azure-cognitiveservices-vision-computervision-0.9.0
WARNING: You are using pip version 21.1.2; however, version 21.3 is available.
You should consider upgrading via the 'c:\\miniconda\\python.exe -m pip install --upgrade pip' command.
### ImportError: cannot import name 'BlobServiceClient' from 'azure.storage.blob' (C:\\Miniconda\\lib\\site-packages\\azure\\storage\\blob\\__init__.py)
azure-storage-blob 1.5.0
azure-storage-common 1.4.2
azure-storage-file 1.4.0
azure-storage-queue 1.4.0
PS C:\\workspace\\project> **pip uninstall azure-storage-blob**
Found existing installation: azure-storage-blob 1.5.0
Uninstalling azure-storage-blob-1.5.0:
Would remove:
c:\\miniconda\\lib\\site-packages\\azure\\storage\\blob\\*
c:\\miniconda\\lib\\site-packages\\azure_storage_blob-1.5.0.dist-info\\*
Proceed (y/n)? y
Successfully uninstalled azure-storage-blob-1.5.0
PS C:\\workspace\\dmil> **pip install azure-storage-blob==12.0.0**
Collecting azure-storage-blob==12.0.0
Downloading azure_storage_blob-12.0.0-py2.py3-none-any.whl (271 kB)
|████████████████████████████████| 271 kB 2.2 MB/s
Requirement already satisfied: cryptography>=2.1.4 in c:\\miniconda\\lib\\site-packages (from azure-storage-blob==12.0.0) (3.4.7)
Collecting azure-core<2.0.0,>=1.0.0
Downloading azure_core-1.19.0-py2.py3-none-any.whl (176 kB)
|████████████████████████████████| 176 kB 6.4 MB/s
...
Successfully installed azure-core-1.19.0 azure-storage-blob-12.0.0
PS C:\\workspace\\project> pip install pillow
PS C:\\workspace\\project> pip install pyspark
2. Blob storage 연결 후 파일 리스트 확인
from azure.storage.blob import BlobServiceClient
from azure.storage.blob import ContainerClient
###
### Connection variables setting
###
STORAGE_ACCOUNT = "account"
STORAGE_CONSTR = "DefaultEndpointsProtocol=https;AccountName=account;AccountKey=key;EndpointSuffix=core.windows.net"
SOURCE_NAME = "container"
###
### Connect to container
###
download_container = ContainerClient.from_connection_string(
conn_str=STORAGE_CONSTR,
container_name=SOURCE_NAME
)
# blob storage image file list
blob_list = download_container.list_blobs()
# print(blob_list)
print("========= Process start ==========\\n")
for blob in blob_list:
print(blob.name)
3. Computer Vision
from azure.storage.blob import BlobServiceClient
from azure.storage.blob import ContainerClient
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials
import os
import sys
import time
import datetime
###
### Authenticates your credentials and creates a client.
###
subscription_key = "subscription_key"
endpoint = "<https://endpoint.cognitiveservices.azure.com/>"
computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))
###
### Connection variables setting
###
STORAGE_ACCOUNT = "account"
STORAGE_CONSTR = "DefaultEndpointsProtocol=https;AccountName=account;AccountKey=key;EndpointSuffix=core.windows.net"
SOURCE_NAME = "container"
TARGET_NAME = "target-container"
###
### Connect to container
###
download_container = ContainerClient.from_connection_string(
conn_str=STORAGE_CONSTR,
container_name=SOURCE_NAME
)
upload_container = ContainerClient.from_connection_string(
conn_str=STORAGE_CONSTR,
container_name=TARGET_NAME
)
# blob storage image file list
blob_list = download_container.list_blobs()
print("========= Process start ==========\\n")
for blob in blob_list:
read_image_url="https://"+STORAGE_ACCOUNT+".blob.core.windows.net/"+SOURCE_NAME+"/"+blob.name
print("===== Read File URL : " + read_image_url + " =====")
# Call API with URL and raw response (allows you to get the operation location)
read_response = computervision_client.read(read_image_url, raw=True)
# Get the operation location (URL with an ID at the end) from the response
read_operation_location = read_response.headers["Operation-Location"]
# Grab the ID from the URL
operation_id = read_operation_location.split("/")[-1]
formattedDate = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
print("##### start => " + formattedDate)
# Call the "GET" API and wait for it to retrieve the results
while True:
read_result = computervision_client.get_read_result(operation_id)
if read_result.status not in ['notStarted', 'running']:
print("read_result.status : " + read_result.status)
break
time.sleep(1)
# Print the detected text, line by line
if read_result.status == OperationStatusCodes.succeeded:
# file에 붙일 timestamp
# timestamp=str(pydatetime.datetime.now().timestamp()).split('.')[0]
# file에 붙일 yyyymmdd_hhMMss
formattedDate = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
# 저장할 파일명 선언 (원본파일명 + 파일생성시간 조합)
file_name = blob.name+"_"+formattedDate+".txt"
# 저장할 data
upload_text = ""
for text_result in read_result.analyze_result.read_results:
for line in text_result.lines:
if (line.appearance != None):
upload_text += str(line.appearance.style.confidence) + ", " + line.text + "\\n"
print(line.text)
else:
upload_text += "None, " + line.text+"\\n"
print(line.text)
formattedDate = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
print("##### end => " + formattedDate)
# Blob upload
upload_container.upload_blob(file_name, upload_text, overwrite=True)
print("===== Blob upload complete =====\\n")
print("\\n========== Process end ==========")
반응형
'Azure' 카테고리의 다른 글
Power Apps를 이용해 Upload Template 설정하기 (0) | 2022.09.29 |
---|---|
Power Apps를 이용해 Azure Blob Storage 에 연결하기 (0) | 2022.09.29 |
Azure Oracle Database(19c) 생성 후 Azure Data Factory(ADF) 연결 (1) | 2022.09.14 |
Azure Data Factory(ADF) self-hosted setting + node 구성 (0) | 2022.09.13 |
Azure CLI venv 설정 (0) | 2022.09.13 |