Python asyncio로 API 호출 성능 개선하기

문제 상황

데이터 동기화 배치 작업이 3시간 넘게 걸리고 있었다. 약 5000개 레코드에 대해 외부 API를 호출하는 작업이었는데, 각 요청마다 평균 2초씩 소요되니 당연한 결과였다.

기존 코드는 단순한 for 루프였다.

import requests

def sync_data(items):
    results = []
    for item in items:
        response = requests.get(f'https://api.example.com/data/{item.id}')
        results.append(response.json())
    return results

asyncio와 aiohttp 도입

Python 3.7이 릴리즈되면서 asyncio가 안정화되었고, 이번 기회에 제대로 적용해보기로 했다.

import asyncio
import aiohttp

async def fetch_data(session, item_id):
    async with session.get(f'https://api.example.com/data/{item_id}') as response:
        return await response.json()

async def sync_data_async(items):
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_data(session, item.id) for item in items]
        results = await asyncio.gather(*tasks)
        return results

# 실행
results = asyncio.run(sync_data_async(items))

결과

동시 요청 수를 조절하기 위해 Semaphore를 추가했다. API 서버 부하를 고려해 동시 50개로 제한했다.

async def sync_data_async(items, concurrency=50):
    semaphore = asyncio.Semaphore(concurrency)
    
    async def bounded_fetch(session, item_id):
        async with semaphore:
            return await fetch_data(session, item_id)
    
    async with aiohttp.ClientSession() as session:
        tasks = [bounded_fetch(session, item.id) for item in items]
        results = await asyncio.gather(*tasks)
        return results

3시간 걸리던 작업이 20분으로 단축되었다. API 서버 응답 속도가 병목이었던 상황에서 비동기 처리가 확실한 효과를 보였다.

주의사항

API rate limit 확인 필수
에러 핸들링을 제대로 해야 일부 실패가 전체에 영향을 주지 않음
로컬 개발 환경에서는 Windows의 ProactorEventLoop 이슈 주의