GCP/Apps Script

Apps Script๋กœ ์›นํŽ˜์ด์ง€๋ฅผ ์ƒ์„ฑํ•ด, PDF, Image ๋“ฑ์˜ URL์„ ์ž…๋ ฅ ๋ฐ›์•„ OCRํ•œ ๊ฒฐ๊ณผ ์ถœ

whistory 2023. 3. 16. 09:59
๋ฐ˜์‘ํ˜•

 

 

 

๐Ÿ’ก Apps Script๋ฅผ ์ด์šฉํ•ด ์›นํŽ˜์ด์ง€๋ฅผ ๋งŒ๋“ค๊ณ ,
     ์›นํŽ˜์ด์ง€์—์„œ PDF๋‚˜ ์ด๋ฏธ์ง€์˜ URL์„ ์ž…๋ ฅ ๋ฐ›์•„ ๊ฒฐ๊ณผ๋ฅผ ์›นํŽ˜์ด์ง€์— ํ…์ŠคํŠธ๋กœ ๋ฟŒ๋ ค์ฃผ๋Š” ๊ธฐ๋Šฅ์„ ๊ตฌํ˜„ํ•ด๋ณธ๋‹ค.

 

 

์•„๋ž˜์˜ ๋‘๊ฐ€์ง€๋ฅผ ์‘์šฉํ•ด๋ณธ๋‹ค.

 

 

 

Apps Script๋กœ ์ด๋ฏธ์ง€ํŒŒ์ผ/PDF ํŒŒ์ผ OCR ํ•˜๊ธฐ

๐Ÿ’กApps Script๋ฅผ ์ด์šฉํ•ด ์ด๋ฏธ์ง€๋‚˜ pdfํŒŒ์ผ์—์„œ ํ…์ŠคํŠธ๋ฅผ ์ถ”์ถœํ•ด๋ณธ๋‹ค. ์ž‘๋™๋ฐฉ์‹์€ ์ด๋ฏธ์ง€๋‚˜ pdf ํŒŒ์ผ์„ Google Drive์— OCR ๋œ ํ˜•ํƒœ์˜ ์ž„์‹œ ํŒŒ์ผ๋กœ upload ํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค. Apps Script ์ขŒ์ธก ๋ฉ”๋‰ด์—์„œ ์„œ๋น„์Šค์˜ [

whiseung.tistory.com

 

 

Apps Script๋กœ ๊ฐ„๋‹จํ•œ ์›นํŽ˜์ด์ง€ ์ƒ์„ฑํ•˜๊ธฐ

Apps Script๋กœ ๊ฐ„๋‹จํ•œ ์›นํŽ˜์ด์ง€๋ฅผ ์ƒ์„ฑ ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด ์›นํŽ˜์ด์ง€์—์„œ Bigquery, Database, Google sheets๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ(์ž…๋ ฅ)ํ•  ์ˆ˜ ์žˆ๋‹ค. Simple Trigger์ธ doGet()๋ฅผ ์ด์šฉํ•œ๋‹ค. function doGet(e) { return HtmlService.createT

whiseung.tistory.com

 

 

 

Code.gs

function doGet(e) {
    Logger.log(JSON.stringify(e));
    var htmlOutput = HtmlService.createTemplateFromFile('inputUrl.html');

    htmlOutput.url = getUrl();
    const input_url = e.parameter['input_url'];
    result = readTextFromFile(input_url);
    htmlOutput.result = result;

    Logger.log("Input Url = " + input_url);
    Logger.log("Ocr result = " + result);

    return htmlOutput.evaluate();
}

function getUrl() {
  const url = ScriptApp.getService().getUrl();
  return url;
}

function readTextFromFile(url) {
    // ์ดˆ๊ธฐ์‹คํ–‰ ์‹œ ์˜ˆ์™ธ์ฒ˜๋ฆฌ
    if ( !url ) {
    	return;
    }
    
    const contentBlob = UrlFetchApp.fetch(url).getBlob();
    const resource = {
        title : contentBlob.getName(),
        mimeType : contentBlob.getContentType()
    }
    const options = {
    	ocr : true
    }
    const docFile = Drive.Files.insert(resource, contentBlob, options);
    const doc = DocumentApp.openById(docFile.id);
    const text = doc.getBody().getText();
    
    Drive.Files.remove(docFile.id);
    return text;
}

inputUrl.html

<!DOCTYPE html>
<html>
  <head>
    <base target="_top">
    <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta3/dist/css/bootstrap.min.css" rel="stylesheet"
    integrity="sha384-eOJMYsd53ii+scO/bJGFsiCZc+5NDVN2yr8+0RDqr0Ql0h+rP48ckxlpbzKgwra6" crossorigin="anonymous">

    <!-- <script type="text/javascript">
      alert("ddd");
    </script -->
  </head>
  
  <body>
    <form action="<?= url ?>" method="GET">
      <div class="container">
        <div class="row frame">
          <h5 class="mt-4 text-center">URL to OCR</h5>
          <h6 class="mb-4 text-center">์ž…๋ ฅํ•œ URL ์˜ PDF, Image์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ OCR๋กœ ๋ณด์—ฌ์คŒ</h6>
          <!-- create form element here -->
          <div class="form-group mb-4 box">
            <input type="text" class="form-control inp mb-3" id="input_url" name="input_url" placeholder="์ž…๋ ฅ" autocomplete="off">      
          </div> 
          <!-- create form until element here -->

          <sapn><?= result ?></sapn>

          <div class="form-group mt-4 mb-4 text-center">
            <input type="submit" class="btn btn-info" name="Submit" /><br>
          </div>

        </div>
      </div>
    </form>

  <script src="https://cdn.jsdelivr.net/npm/@popperjs/core@2.9.1/dist/umd/popper.min.js" integrity="sha384-SR1sx49pcuLnqZUnnPwx6FCym0wLsk5JZuNx2bPPENzswTNFaQU1RDvt3wT4gWFG" crossorigin="anonymous"></script>
  <script src="https://cdn.jsdelivr.net/npm/bootstrap@5.0.0-beta3/dist/js/bootstrap.min.js" integrity="sha384-j0CNLUeiqtyaRmlzUHCPZ+Gy5fQu0dQ6eZ/xAww941Ai1SxSY+0EQqNXNE6DZiVc" crossorigin="anonymous"></script>
  </body>
</html>

 

 

์ด๋ฏธ์ง€ URL

https://i.stack.imgur.com/i1Abv.png

 

 

 

 

PDF OCR

https://www.africau.edu/images/default/sample.pdf

 

 

 

 

 

 

https://script.google.com/macros/s/AKfycbzY4RxgEuLb4rWM5Kk681H1UJZXAuTvPnJ108NINF6vVy3nA0uPIKYzPU7VHoBBHBsNQg/exec

 

script.google.com

 

 

 

๋ฐ˜์‘ํ˜•