OpenAI API 스트리밍 응답 처리 시 메모리 누수 해결

문제 상황

사내 챗봇 서비스에서 GPT-4 API를 스트리밍으로 연동했는데, 장시간 운영 후 Node.js 서버의 메모리 사용량이 지속적으로 증가하는 현상이 발견됐다. 사용자가 응답 중간에 페이지를 이탈하거나 새로고침할 때 특히 심했다.

원인 분석

기존 코드는 다음과 같았다.

const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: messages,
  stream: true,
});

for await (const chunk of response) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    res.write(`data: ${JSON.stringify({ content })}\n\n`);
  }
}

클라이언트가 연결을 끊어도 서버는 OpenAI API로부터 스트림을 계속 받고 있었다. res.on('close') 이벤트는 감지했지만, OpenAI API 요청 자체를 중단하지 않았던 것이 문제였다.

해결 방법

AbortController를 사용해 클라이언트 연결 종료 시 API 요청도 함께 취소하도록 수정했다.

const abortController = new AbortController();

res.on('close', () => {
  abortController.abort();
});

const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: messages,
  stream: true,
}, {
  signal: abortController.signal,
});

try {
  for await (const chunk of response) {
    const content = chunk.choices[0]?.delta?.content;
    if (content) {
      res.write(`data: ${JSON.stringify({ content })}\n\n`);
    }
  }
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Stream aborted by client');
    return;
  }
  throw error;
}

결과

배포 후 24시간 모니터링 결과, 메모리 사용량이 안정적으로 유지됐다. 동시 접속자 100명 기준으로 기존 대비 약 40% 메모리 사용량이 감소했다.

OpenAI API 비용도 줄었다. 중단된 요청에 대한 토큰 비용이 청구되지 않기 때문이다. 월 API 비용이 약 15% 절감됐다.