Elasticsearch 대용량 데이터 조회 시 deep pagination 문제 해결

문제 상황

로그 분석 대시보드를 개발하던 중 특정 기간의 모든 로그를 export하는 기능에서 에러가 발생했다.

{
  "error": {
    "type": "query_phase_execution_exception",
    "reason": "Result window is too large, from + size must be less than or equal to: [10000]"
  }
}

Elasticsearch는 기본적으로 from + size가 10,000을 초과하는 deep pagination을 허용하지 않는다. 메모리 문제 때문이다.

시도한 방법들

1. max_result_window 증가

PUT /logs/_settings
{
  "index.max_result_window": 50000
}

임시방편이었다. 데이터가 더 늘어나면 같은 문제가 반복될 것이 뻔했다.

2. Scroll API

const response = await client.search({
  index: 'logs',
  scroll: '1m',
  body: {
    size: 1000,
    query: { match_all: {} }
  }
});

Scroll API는 실시간 검색에는 부적합하고, 컨텍스트 유지를 위한 리소스 소모가 있었다.

최종 해결: search_after

search_after를 사용하면 커서 기반으로 안전하게 페이지네이션할 수 있다.

const getAllLogs = async () => {
  const allResults = [];
  let searchAfter = null;

  while (true) {
    const body = {
      size: 1000,
      sort: [{ timestamp: 'asc' }, { _id: 'asc' }],
      query: { range: { timestamp: { gte: startDate, lte: endDate } } }
    };

    if (searchAfter) {
      body.search_after = searchAfter;
    }

    const response = await client.search({ index: 'logs', body });
    const hits = response.body.hits.hits;

    if (hits.length === 0) break;

    allResults.push(...hits.map(hit => hit._source));
    searchAfter = hits[hits.length - 1].sort;
  }

  return allResults;
};

주의사항

sort 필드에는 unique한 값이 포함되어야 한다. timestamp만으로는 부족할 수 있어 _id를 추가했다.
인덱스 데이터가 실시간으로 변경되면 일부 문서를 놓치거나 중복 조회할 수 있다.
PIT(Point in Time)를 함께 사용하면 더 안정적이지만, 현재 ES 버전(6.8)에서는 지원하지 않는다.

결과

100만 건 이상의 로그 데이터를 안정적으로 조회할 수 있게 되었다. 메모리 사용량도 일정하게 유지되어 서버 부하가 줄었다.