Skip to content
Snippets Groups Projects
  • Paul Millar's avatar
    6f6b4c75
    api-pmh Add connection caching · 6f6b4c75
    Paul Millar authored
    Motivation:
    
    The OAI-PMH endpoints being queried behave so that the client makes a
    large number of requests, each returning a relatively small amount of
    data.
    
    When requests are processed by the OAI-PMH server quickly, the overhead
    for establishing the TCP and TLS connections can be very significant.
    
    Connection caching (sometimes called HTTP Keep Alive) involves sending
    multiple HTTP requests over a single TCP connection, allowing us to
    ameliorate the connection overhead by (effectively) spreading the cost
    over all OAI-PMH requests.
    
    Modification:
    
    Update client to use `persistent_http` connection pool, via the
    `persistent_httparty` adapter.
    
    A bug was discovered, where the host entity is cached between successive
    requests.
    
    Result:
    
    OAI-PMH requests are now faster.  Some observed speedups per request are
    (0.12 +/- 0.02) s, (0.16 +/- 0.01) s and (0.16 +/- 0.03) s for ESRF, HZB
    and HZDR respectively (measured with ListIdentifiers request on Dublin
    Core, following the resumptionToken).
    
    The overall impact of this improvement depends on how long the OAI-PMH
    endpoint takes to process a request.  For end above endpoints, the
    percentage improvements (per request) are 12%, 42% and 70% respectively.
    6f6b4c75
    History
    api-pmh Add connection caching
    Paul Millar authored
    Motivation:
    
    The OAI-PMH endpoints being queried behave so that the client makes a
    large number of requests, each returning a relatively small amount of
    data.
    
    When requests are processed by the OAI-PMH server quickly, the overhead
    for establishing the TCP and TLS connections can be very significant.
    
    Connection caching (sometimes called HTTP Keep Alive) involves sending
    multiple HTTP requests over a single TCP connection, allowing us to
    ameliorate the connection overhead by (effectively) spreading the cost
    over all OAI-PMH requests.
    
    Modification:
    
    Update client to use `persistent_http` connection pool, via the
    `persistent_httparty` adapter.
    
    A bug was discovered, where the host entity is cached between successive
    requests.
    
    Result:
    
    OAI-PMH requests are now faster.  Some observed speedups per request are
    (0.12 +/- 0.02) s, (0.16 +/- 0.01) s and (0.16 +/- 0.03) s for ESRF, HZB
    and HZDR respectively (measured with ListIdentifiers request on Dublin
    Core, following the resumptionToken).
    
    The overall impact of this improvement depends on how long the OAI-PMH
    endpoint takes to process a request.  For end above endpoints, the
    percentage improvements (per request) are 12%, 42% and 70% respectively.
This project manages its dependencies using Bundler. Learn more