Azure

VSCode + Azure Blob Storage + Cognitive Services Computer Vision

whistory 2022. 9. 13. 15:43
반응형



VSCode로 개발하기.

Azure Blob Storage에 접근해, Computer Vision으로 데이터 추출하기

 

1. VM - 기본설정

### ModuleNotFoundError: No module named 'azure'

PS C:\\workspace\\project> **pip install azure**
Collecting azure
  Downloading azure-5.0.0.zip (4.6 kB)
Downloading azure-4.0.0-py2.py3-none-any.whl (2.2 kB)
Collecting azure-mgmt~=4.0
  Downloading azure_mgmt-4.0.0-py2.py3-none-any.whl (3.0 kB)
Collecting azure-eventgrid~=1.1
  Downloading azure_eventgrid-1.3.0-py2.py3-none-any.whl (167 kB)
     |████████████████████████████████| 167 kB 2.2 MB/s
...
Installing collected packages: 
oauthlib, requests-oauthlib, PyJWT, isodate, azure-nspkg, msrest, azure-mgmt-nspkg, adal, msrestazure, azure-mgmt-datalake-nspkg, azure-common, azure-storage-common, azure-mgmt-web, azure-mgmt-trafficmanager, azure-mgmt-subscription, azure-mgmt-storage, azure-mgmt-sql, azure-mgmt-signalr, azure-mgmt-servicefabric, azure-mgmt-servicebus, azure-mgmt-search, azure-mgmt-scheduler, azure-mgmt-resource, azure-mgmt-reservations, azure-mgmt-relay, azure-mgmt-redis, azure-mgmt-recoveryservicesbackup, azure-mgmt-recoveryservices, azure-mgmt-rdbms, azure-mgmt-powerbiembedded, azure-mgmt-policyinsights, azure-mgmt-notificationhubs, azure-mgmt-network, azure-mgmt-msi, azure-mgmt-monitor, azure-mgmt-media, azure-mgmt-marketplaceordering, azure-mgmt-maps, azure-mgmt-managementpartner, azure-mgmt-managementgroups, azure-mgmt-machinelearningcompute, azure-mgmt-logic, azure-mgmt-loganalytics, azure-mgmt-keyvault, azure-mgmt-iothubprovisioningservices, azure-mgmt-iothub, azure-mgmt-iotcentral, azure-mgmt-hanaonazure, azure-mgmt-eventhub, azure-mgmt-eventgrid, azure-mgmt-dns, azure-mgmt-devtestlabs, azure-mgmt-devspaces, azure-mgmt-datamigration, azure-mgmt-datalake-store, azure-mgmt-datalake-analytics, azure-mgmt-datafactory, azure-mgmt-cosmosdb, azure-mgmt-containerservice, azure-mgmt-containerregistry, azure-mgmt-containerinstance, azure-mgmt-consumption, azure-mgmt-compute, azure-mgmt-commerce, azure-mgmt-cognitiveservices, azure-mgmt-cdn, azure-mgmt-billing, azure-mgmt-batchai, azure-mgmt-batch, azure-mgmt-authorization, azure-mgmt-applicationinsights, azure-mgmt-advisor, azure-cosmosdb-nspkg, azure-storage-queue, azure-storage-file, azure-storage-blob, azure-servicemanagement-legacy, azure-servicefabric, azure-servicebus, azure-mgmt, azure-loganalytics, azure-keyvault, azure-graphrbac, azure-eventgrid, azure-datalake-store, azure-cosmosdb-table, azure-batch, azure-applicationinsights, azure
WARNING: You are using pip version 21.1.2; however, version 21.3 is available.
You should consider upgrading via the 'c:\\miniconda\\python.exe -m pip install --upgrade pip' command.

### ModuleNotFoundError: No module named 'azure.cognitiveservices'

PS C:\\workspace\\dmil> **pip install azure-cognitiveservices-vision-computervision**
Collecting azure-cognitiveservices-vision-computervision
  Downloading azure_cognitiveservices_vision_computervision-0.9.0-py2.py3-none-any.whl (39 kB)
...
Installing collected packages: azure-cognitiveservices-vision-computervision
Successfully installed azure-cognitiveservices-vision-computervision-0.9.0
WARNING: You are using pip version 21.1.2; however, version 21.3 is available.
You should consider upgrading via the 'c:\\miniconda\\python.exe -m pip install --upgrade pip' command.

### ImportError: cannot import name 'BlobServiceClient' from 'azure.storage.blob' (C:\\Miniconda\\lib\\site-packages\\azure\\storage\\blob\\__init__.py)

azure-storage-blob                            1.5.0
azure-storage-common                          1.4.2
azure-storage-file                            1.4.0
azure-storage-queue                           1.4.0

PS C:\\workspace\\project> **pip uninstall azure-storage-blob**
Found existing installation: azure-storage-blob 1.5.0
Uninstalling azure-storage-blob-1.5.0:
  Would remove:
    c:\\miniconda\\lib\\site-packages\\azure\\storage\\blob\\*
    c:\\miniconda\\lib\\site-packages\\azure_storage_blob-1.5.0.dist-info\\*
Proceed (y/n)? y
  Successfully uninstalled azure-storage-blob-1.5.0
PS C:\\workspace\\dmil> **pip install azure-storage-blob==12.0.0**
Collecting azure-storage-blob==12.0.0
  Downloading azure_storage_blob-12.0.0-py2.py3-none-any.whl (271 kB)
     |████████████████████████████████| 271 kB 2.2 MB/s
Requirement already satisfied: cryptography>=2.1.4 in c:\\miniconda\\lib\\site-packages (from azure-storage-blob==12.0.0) (3.4.7)
Collecting azure-core<2.0.0,>=1.0.0
  Downloading azure_core-1.19.0-py2.py3-none-any.whl (176 kB)
     |████████████████████████████████| 176 kB 6.4 MB/s
...
Successfully installed azure-core-1.19.0 azure-storage-blob-12.0.0

PS C:\\workspace\\project> pip install pillow
PS C:\\workspace\\project> pip install pyspark

2. Blob storage 연결 후 파일 리스트 확인

from azure.storage.blob import BlobServiceClient
from azure.storage.blob import ContainerClient

###
### Connection variables setting
###
STORAGE_ACCOUNT = "account"
STORAGE_CONSTR = "DefaultEndpointsProtocol=https;AccountName=account;AccountKey=key;EndpointSuffix=core.windows.net"
SOURCE_NAME = "container"

###
### Connect to container
###
download_container = ContainerClient.from_connection_string(
    conn_str=STORAGE_CONSTR,
    container_name=SOURCE_NAME
)

# blob storage image file list
blob_list = download_container.list_blobs()
# print(blob_list)

print("========= Process start ==========\\n")
for blob in blob_list:
    print(blob.name)

3. Computer Vision

from azure.storage.blob import BlobServiceClient
from azure.storage.blob import ContainerClient
from azure.cognitiveservices.vision.computervision import ComputerVisionClient
from azure.cognitiveservices.vision.computervision.models import OperationStatusCodes
from azure.cognitiveservices.vision.computervision.models import VisualFeatureTypes
from msrest.authentication import CognitiveServicesCredentials

import os
import sys
import time
import datetime

###
### Authenticates your credentials and creates a client.
###
subscription_key = "subscription_key"
endpoint = "<https://endpoint.cognitiveservices.azure.com/>"

computervision_client = ComputerVisionClient(endpoint, CognitiveServicesCredentials(subscription_key))

###
### Connection variables setting
###
STORAGE_ACCOUNT = "account"
STORAGE_CONSTR = "DefaultEndpointsProtocol=https;AccountName=account;AccountKey=key;EndpointSuffix=core.windows.net"
SOURCE_NAME = "container"
TARGET_NAME = "target-container"

###
### Connect to container
###
download_container = ContainerClient.from_connection_string(
    conn_str=STORAGE_CONSTR,
    container_name=SOURCE_NAME
)
upload_container = ContainerClient.from_connection_string(
    conn_str=STORAGE_CONSTR,
    container_name=TARGET_NAME
)

# blob storage image file list
blob_list = download_container.list_blobs()

print("========= Process start ==========\\n")
for blob in blob_list:
    read_image_url="https://"+STORAGE_ACCOUNT+".blob.core.windows.net/"+SOURCE_NAME+"/"+blob.name
    print("===== Read File URL : " + read_image_url + " =====")

    # Call API with URL and raw response (allows you to get the operation location)
    read_response = computervision_client.read(read_image_url,  raw=True)
    # Get the operation location (URL with an ID at the end) from the response
    read_operation_location = read_response.headers["Operation-Location"]
    # Grab the ID from the URL
    operation_id = read_operation_location.split("/")[-1]
    formattedDate = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
    print("##### start => " + formattedDate)
    # Call the "GET" API and wait for it to retrieve the results 
    while True:
        read_result = computervision_client.get_read_result(operation_id)
        if read_result.status not in ['notStarted', 'running']:
            print("read_result.status : " + read_result.status)
            break
        time.sleep(1)

    # Print the detected text, line by line
    if read_result.status == OperationStatusCodes.succeeded:
        # file에 붙일 timestamp
        # timestamp=str(pydatetime.datetime.now().timestamp()).split('.')[0]
        # file에 붙일 yyyymmdd_hhMMss
        formattedDate = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

        # 저장할 파일명 선언 (원본파일명 + 파일생성시간 조합)
        file_name = blob.name+"_"+formattedDate+".txt"
        # 저장할 data
        upload_text = ""

        for text_result in read_result.analyze_result.read_results:
            for line in text_result.lines:
                if (line.appearance != None):
                    upload_text += str(line.appearance.style.confidence) + ", " + line.text + "\\n"
                    print(line.text)
                else:
                    upload_text += "None, " + line.text+"\\n"
                    print(line.text)
        formattedDate = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
        print("##### end => " + formattedDate)

        # Blob upload
        upload_container.upload_blob(file_name, upload_text, overwrite=True)
        print("===== Blob upload complete =====\\n")

print("\\n========== Process end ==========")
반응형