OpenAI API 응답 스트리밍 처리 시 메모리 누수 해결

문제 상황

사내 AI 챗봇 서비스에서 GPT-4 API를 스트리밍 방식으로 연동했다. 초기에는 문제없이 동작했지만, 하루 이상 서버를 운영하자 메모리 사용량이 지속적으로 증가하는 현상이 발견됐다.

원인 분석

Node.js 힙 덤프를 분석한 결과, OpenAI SDK의 스트림 객체가 정리되지 않고 계속 메모리에 남아있었다. 특히 사용자가 응답을 받는 도중 연결을 끊는 경우, 스트림이 제대로 종료되지 않았다.

// 기존 코드 - 문제 있음
const stream = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: messages,
  stream: true,
});

for await (const chunk of stream) {
  res.write(chunk.choices[0]?.delta?.content || '');
}

해결 방법

클라이언트 연결 종료 이벤트를 감지하고, 스트림을 명시적으로 종료하도록 수정했다.

const stream = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: messages,
  stream: true,
});

const abortController = new AbortController();

req.on('close', () => {
  abortController.abort();
  stream.controller?.abort();
});

try {
  for await (const chunk of stream) {
    if (abortController.signal.aborted) break;
    res.write(chunk.choices[0]?.delta?.content || '');
  }
} catch (error) {
  if (error.name !== 'AbortError') throw error;
} finally {
  res.end();
}

결과

메모리 사용량이 안정화되었고, 장시간 운영 시에도 문제가 발생하지 않았다. 스트리밍 API를 사용할 때는 항상 cleanup 로직을 함께 구현해야 한다는 교훈을 얻었다.