Study

KOSIS(๊ตญ๊ฐ€ํ†ต๊ณ„ํฌํ„ธ) ํ†ต๊ณ„์ž๋ฃŒ openAPI ์‚ฌ์šฉ

whistory 2022. 11. 29. 09:43
๋ฐ˜์‘ํ˜•

 

๐Ÿ’ก๊ตญ๊ฐ€ํ†ต๊ณ„ํฌํ„ธ(KOSIS) ์—์„œ ์ œ๊ณตํ•˜๋Š” openAPI๋ฅผ ์ด์šฉํ•ด ํ†ต๊ณ„๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์ง‘ํ•ด๋ณธ๋‹ค.

 

 

๊ตญ๊ฐ€ํ†ต๊ณ„ํฌํ„ธ์—์„œ ์ œ๊ณตํ•˜๋Š” openAPI๋ฅผ ๊ฐ€์ ธ์™€ ๋ฐ์ดํ„ฐ๋ฅผ ์ ์žฌํ•˜๋Š” ํ”„๋กœ์ ํŠธ๊ฐ€ ์ƒ๊ฒผ๋‹ค.

์ผ๋‹จ target์€ AWS S3 ์ผ๊ฒƒ์œผ๋กœ ์ถ”์ธก. ํ•œ์Šคํ…์”ฉ ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.

 

  1. KOSIS ํ†ต๊ณ„์ž๋ฃŒ openAPI ์‚ฌ์šฉ
  2. KOSIS ๋Œ€์šฉ๋Ÿ‰ํ†ต๊ณ„์ž๋ฃŒ openAPI ๋ฅผ ์ด์šฉํ•ด parquet ํŒŒ์ผ ์ƒ์„ฑ
  3. KOSIS openAPI๋ฅผ ์ด์šฉํ•ด ๊ฐ€์ ธ์˜จ ๋ฐ์ดํ„ฐ๋ฅผ AWS S3์— parquet ํŒŒ์ผ๋กœ ์ €์žฅ

 

์ผ๋‹จ KOSIS์˜ openAPI ์‚ฌ์šฉ๋ฒ•์„ ํ™•์ธํ•ด๋ณธ๋‹ค.

 

 

KOSIS openAPI ์‚ฌ์ดํŠธ๋กœ ์ด๋™ํ•œ๋‹ค. (https://kosis.kr/openapi/index/index.jsp)

 

ํ™œ์šฉ์‹ ์ฒญ์„ ํ•˜๊ณ , [๊ฐœ๋ฐœ๊ฐ€์ด๋“œ] - [ํ†ต๊ณ„์ž๋ฃŒ] ๋กœ ์ด๋™ํ•œ๋‹ค.

[URL ์ƒ์„ฑ] ์œผ๋กœ ์ด๋™ํ•ด, ์‚ฌ์šฉํ•  ํ†ต๊ณ„ํ‘œ๋ฅผ ์กฐํšŒํ•˜๊ณ ,

์‚ฌ์šฉ์—ฌ๋ถ€๋ฅผ ์ฒดํฌํ•˜๊ณ  [์„ ํƒ] ์„ ํด๋ฆญํžŒ๋‹ค.

 

 

 

์กฐํšŒํ•  ํ•ญ๋ชฉ๋“ค๊ณผ, ์ถœ๋ ฅํ˜•ํƒœ, ์กฐํšŒ๊ธฐ๊ฐ„์„ ์„ ํƒํ•˜๊ณ  [URL๋ณด๊ธฐ]๋ฅผ ํด๋ฆญํ•œ๋‹ค.

๋ถ„๋ฅ˜๊ฐ’ ์„ ํƒํ™”๋ฉด

 

 

 

์ƒ์„ฑ๋œ url์„ ๋ธŒ๋ผ์šฐ์ €์— ๋ถ™์—ฌ ๋„ฃ์œผ๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ์„ ํƒํ•œ ๋ฐ์ดํ„ฐ ํฌ๋งท์œผ๋กœ ์ถœ๋ ฅ๋œ๋‹ค.

import json
from urllib.request import urlopen

with urlopen("https://kosis.kr/openapi/Param/statisticsParameterData.do?method=getList&apiKey=apikey=&itmId=16135A1+16135A3+16135A5+&objL1=A010+A011+A012+A013+A014+A241+&objL2=&objL3=&objL4=&objL5=&objL6=&objL7=&objL8=&format=json&jsonVD=Y&prdSe=Y&newEstPrdCnt=1&orgId=135&tblId=DT_135N_1A001A") as url:
    json_file = url.read()
    py_json = json.loads(json_file.decode('utf-8'))
    # print(py_json)

    for i, v in enumerate(py_json):
        if i == 0 :
            print(v['TBL_NM'])

        print(v['PRD_DE'] + "\\t/\\t  " + v['C1_NM']+ "\\t/\\t  " + v['ITM_NM'] + "\\t/\\t" + v['DT'])

 

 

 

 

 

import json
from urllib.request import urlopen
import pandas as pd

with urlopen("https://kosis.kr/openapi/Param/statisticsParameterData.do?method=getList&apiKey=apikey&itmId=16135A1+16135A3+16135A5+&objL1=A010+A011+A012+A013+A014+A241+&objL2=&objL3=&objL4=&objL5=&objL6=&objL7=&objL8=&format=json&jsonVD=Y&prdSe=Y&newEstPrdCnt=1&orgId=135&tblId=DT_135N_1A001A") as url:
    json_file = url.read()
    py_json = json.loads(json_file.decode('utf-8'))

    data = []
    for i, v in enumerate(py_json):
        if i == 0 :
            print(v['TBL_NM'])

        value = []
        value.append(v['PRD_DE'])
        value.append(v['C1_NM'])
        value.append(v['ITM_NM'])
        value.append(v['DT'])
        value.append(v['UNIT_NM'])

        data.append(value)

df = pd.DataFrame(data)
df.columns = ['yyyyymm', 'category1', 'category2', 'value', 'unit']
df.head()
print(df)
PS C:\\workspace\\api> & C:/Users/์„œํœ˜์Šน/AppData/Local/Microsoft/WindowsApps/python3.8.exe c:/workspace/api/test.py
๋ฒ”์ฃ„์˜ ๋ฐœ์ƒ ๊ฒ€๊ฑฐ์ƒํ™ฉ(์ด๊ด„)
   yyyyymm            category1 category2   value unit
0     2020                   ์‚ฌ๊ธฐ      ๋ฐœ์ƒ๊ฑด์ˆ˜  340925    ๊ฑด
1     2020                   ์‚ฌ๊ธฐ      ๊ฒ€๊ฑฐ๊ฑด์ˆ˜  234065    ๊ฑด
2     2020                   ์‚ฌ๊ธฐ      ๊ฒ€๊ฑฐ์ธ์›  240531    ๋ช…
3     2020             ์ปดํ“จํ„ฐ๋“ฑ์‚ฌ์šฉ์‚ฌ๊ธฐ      ๋ฐœ์ƒ๊ฑด์ˆ˜    7451    ๊ฑด
4     2020             ์ปดํ“จํ„ฐ๋“ฑ์‚ฌ์šฉ์‚ฌ๊ธฐ      ๊ฒ€๊ฑฐ๊ฑด์ˆ˜    2050    ๊ฑด
5     2020             ์ปดํ“จํ„ฐ๋“ฑ์‚ฌ์šฉ์‚ฌ๊ธฐ      ๊ฒ€๊ฑฐ์ธ์›    2595    ๋ช…
6     2020                 ๋ถ€๋‹น์ด๋“      ๋ฐœ์ƒ๊ฑด์ˆ˜      83    ๊ฑด
7     2020                 ๋ถ€๋‹น์ด๋“      ๊ฒ€๊ฑฐ๊ฑด์ˆ˜      67    ๊ฑด
8     2020                 ๋ถ€๋‹น์ด๋“      ๊ฒ€๊ฑฐ์ธ์›     108    ๋ช…
9     2020             ํŽธ์˜์‹œ์„ค๋ถ€์ •์ด์šฉ      ๋ฐœ์ƒ๊ฑด์ˆ˜    1638    ๊ฑด
10    2020             ํŽธ์˜์‹œ์„ค๋ถ€์ •์ด์šฉ      ๊ฒ€๊ฑฐ๊ฑด์ˆ˜    1287    ๊ฑด
11    2020             ํŽธ์˜์‹œ์„ค๋ถ€์ •์ด์šฉ      ๊ฒ€๊ฑฐ์ธ์›    1519    ๋ช…
12    2020  ์ „๊ธฐํ†ต์‹ ๊ธˆ์œต์‚ฌ๊ธฐํ”ผํ•ด๊ธˆํ™˜๊ธ‰์—๊ด€ํ•œํŠน๋ณ„๋ฒ•      ๋ฐœ์ƒ๊ฑด์ˆ˜     534    ๊ฑด
13    2020  ์ „๊ธฐํ†ต์‹ ๊ธˆ์œต์‚ฌ๊ธฐํ”ผํ•ด๊ธˆํ™˜๊ธ‰์—๊ด€ํ•œํŠน๋ณ„๋ฒ•      ๊ฒ€๊ฑฐ๊ฑด์ˆ˜     286    ๊ฑด
14    2020  ์ „๊ธฐํ†ต์‹ ๊ธˆ์œต์‚ฌ๊ธฐํ”ผํ•ด๊ธˆํ™˜๊ธ‰์—๊ด€ํ•œํŠน๋ณ„๋ฒ•      ๊ฒ€๊ฑฐ์ธ์›     368    ๋ช…
15    2020            ๋ณดํ—˜์‚ฌ๊ธฐ๋ฐฉ์ง€ํŠน๋ณ„๋ฒ•      ๋ฐœ์ƒ๊ฑด์ˆ˜    3523    ๊ฑด
16    2020            ๋ณดํ—˜์‚ฌ๊ธฐ๋ฐฉ์ง€ํŠน๋ณ„๋ฒ•      ๊ฒ€๊ฑฐ๊ฑด์ˆ˜    3378    ๊ฑด
17    2020            ๋ณดํ—˜์‚ฌ๊ธฐ๋ฐฉ์ง€ํŠน๋ณ„๋ฒ•      ๊ฒ€๊ฑฐ์ธ์›   12973    ๋ช…
PS C:\\workspace\\api>

 

 

๋ฐ˜์‘ํ˜•