OpenAI API 응답 스트리밍 처리 시 메모리 누수 해결

문제 상황

채팅 기능에서 OpenAI API의 스트리밍 응답을 처리하는 과정에서 메모리 사용량이 지속적으로 증가했다. 사용자가 여러 대화를 진행할수록 Node.js 프로세스의 힙 메모리가 회수되지 않았고, 결국 서버 재시작이 필요했다.

원인 분석

기존 코드는 스트리밍 응답을 받을 때 청크를 배열에 계속 누적했다.

const chunks = [];
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: messages,
  stream: true,
});

for await (const chunk of stream) {
  chunks.push(chunk); // 문제
  const content = chunk.choices[0]?.delta?.content;
  if (content) res.write(content);
}

문제는 두 가지였다:

모든 청크를 메모리에 보관하면서 실시간 전송
클라이언트 연결 종료 시 스트림 정리 미흡

해결 방법

청크 배열 제거하고 필요한 데이터만 추출해 전송하도록 변경했다.

const controller = new AbortController();

req.on('close', () => {
  controller.abort();
});

try {
  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: messages,
    stream: true,
  }, {
    signal: controller.signal
  });

  let fullContent = '';
  
  for await (const chunk of stream) {
    if (controller.signal.aborted) break;
    
    const content = chunk.choices[0]?.delta?.content || '';
    if (content) {
      fullContent += content;
      res.write(content);
    }
  }
  
  // DB 저장 등 필요한 경우에만
  await saveMessage(fullContent);
  
} catch (error) {
  if (error.name === 'AbortError') {
    console.log('Stream aborted by client');
    return;
  }
  throw error;
}

AbortController로 클라이언트 연결 종료를 감지하고, 불필요한 청크 보관을 제거했다.

결과

메모리 사용량이 안정화되었다. 100회 대화 기준 1.2GB에서 300MB로 감소했고, 가비지 컬렉션도 정상 작동했다. 스트리밍 응답은 필요한 시점에만 최소한으로 버퍼링하는 것이 핵심이었다.