A warm welcome to libh2o

A warm welcome to libh2o

Contents

Hello, and welcome to libh2o
Basics
HTTP/2 notes
Data types & storage
Hello, world
Object lifetime, memory management
Config, context, accept context
  7.1  Global configuration
  7.2  Accept context
  7.3  Context
Generators
Serving files
10  Passing data to handlers and generators
  10.1  To a generator
11  Logging
12  SSL / TLS
13  HTTP/2 specific features
14  Event loops (libuv, native)
15  H2O as a client library
16  H2O multithreading
17  Notes on asynchronous operations
  17.1  Stop and proceed
  17.2  Asynchronous

   

Hello, and welcome to libh2o

NOTE: This is very early work & may have a lot of mistakes. If you find any, please let me know at bert.hubert@powerdns.com or @PowerDNS_Bert.

H2O is a powerful web server used and developed by Fastly. Its mission:

H2O is a new generation HTTP server that provides quicker response to users with less CPU utilization when compared to older generation of web servers. Designed from ground-up, the server takes full advantage of HTTP/2 features including prioritized content serving and server push, promising outstanding experience to the visitors of your web site.

H2O as a web server is very impressive and is seeing more and more use. H2O is powered by libh2o, which contains all the web serving power of H2O itself. libh2o is also packaged separately for third party use. It is MIT licensed.

I have used more web server frameworks than is reasonable. libh2o is by far the most impressive effort I have ever seen, possibly because as far as I know, it is the only web serving library that also powers a leading web serving project.

The one downside of libh2o is that its documentation is currently very spread out. Parts live in the h2o.h include file, other parts can be found in answers in the GitHub issue tracker, in issues tagged with 'FAQ'. A great (if slightly outdated) introduction can also be found in this presentation.

Here, I hope to provide an introduction to libh2o that is correct, if not necessarily complete. Over time, the goal is to maintain correctness while coverage increases.

The H2O maintainers are incredibly friendly and helpful. With this document I hope to help them along in improving (lib)h2o.

If you want to help, this file can be edited on GitHub.

   

Basics

H2O comes with a full & leading implementation of HTTP/2. It strives to make HTTP/2 as fast as possible, to the point it can be a universal method of performing calls and transferring data.

Within H2O, HTTP requests are 'events' within connections. Events end up with a handler, and handlers use generators to send content.

Of specific note, unlike most libraries, libh2o can be used fully asynchronously. This means that your code can receive an HTTP request & go chew on it. Meanwhile, other HTTP requests come in and also get served. Once your code is done with the earlier HTTP request, it can send out the response.

This is specifically relevant since HTTP/2 is fundamentally asynchronous too: many requests may come in at the same time over a single connection. These all need to get serviced in parallel, and not sequentially.

   

HTTP/2 notes

By default libh2o is protocol agnostic. Your code does not need to know if it is dealing with HTTP/1 or HTTP/2, and out of the box both are available. The behaviour of how HTTP/1 sessions might be upgraded to HTTP/2 and the interaction with TLS/SSL can of course be tweaked.

Specific HTTP/2 features such as server push are of course only available on HTTP/2 connections.

   

Data types & storage

libh2o is written in C. This makes the library very accessible, but also means that H2O has to create some data structures from scratch. Here are a few types you will encounter a lot:

Memory allocation is always difficult. To make it easier, libh2o offers us pools associated with HTTP requests and connections. Any data allocated to a pool will get released automatically when the pool gets disposed.

   

Hello, world

The libh2o “Hello World” example server is longer than you may be used to. This is because h2o is a very powerful library and it needs to be generic enough to cover a lot of use cases. Some simpler libraries may allow you to be up and running in 12 lines but leave you hanging if you need to set a socket option.

A basic libh2o web server takes around 80 lines to deliver a fully functional web server. What follows is a slightly reordered version of simple.c which in turn is a slightly modified version of simple.c from the h2o examples directory.

If you want to get going quickly, there is a Makefile. Simply clone the repository and run a Make.

First the basics:

static h2o_globalconf_t config;
static h2o_context_t ctx;
static h2o_accept_ctx_t accept_ctx;

These three contain global h2o configuration, a context for your web server plus an accept context for your listener thread. Note that the static is not accidental: be sure to zero these structs before using them!

Then to get going:

h2o_config_init(&config);
h2o_hostconf_t* hostconf = h2o_config_register_host(&config, h2o_iovec_init(H2O_STRLIT("default")), 65535);

This initializes the global config to default values and it registers a host called 'default'. Note that this accepts a h2o_iovec_t as described above. When registering the host we also pass a port, where 65535 means 'the default port for the protocol used', or in other words '80' or '443'.

If only one host is registered, all of this is moot, since all queries, regardless of host name or port, then end up with the first host.

When registering multiple host names, a single registration is enough when using standard ports (80, 443). If non-standard ports are used, you must register a name twice, once for each port (say, 8000, 4430).

Next up we register a path for "/hello”. We then create a handler for this path, and we add a function helloHandler to it:

h2o_pathconf_t *pathconf = h2o_config_register_path(hostconf, "/hello", 0);
h2o_handler_t *handler = h2o_create_handler(pathconf, sizeof(*handler));
handler->on_req = handler;

Note that paths are checked in order. This means that if the "/" path is registered first, no subsequent paths are ever going to match. So register paths from most specific to most general.

We then initialize the global ctx with the global config.

h2o_context_init(&ctx, h2o_evloop_create(), &config);
accept_ctx.ctx = &ctx;
accept_ctx.hosts = config.hosts;

Next up, listening and setting up the event loop:

    struct sockaddr_in addr;
    int fd, reuseaddr_flag = 1;
    h2o_socket_t *sock;

    memset(&addr, 0, sizeof(addr));
    addr.sin_family = AF_INET;
    addr.sin_addr.s_addr = htonl(0x7f000001);
    addr.sin_port = htons(12345);

    if ((fd = socket(AF_INET, SOCK_STREAM, 0)) == -1 ||
        setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuseaddr_flag, sizeof(reuseaddr_flag)) != 0 ||
        bind(fd, (struct sockaddr *)&addr, sizeof(addr)) != 0 || listen(fd, SOMAXCONN) != 0) {
        return -1;
    }

This is the usual longwinded C way of creating a socket, in this case, one that listens on 127.0.0.1:12345.

    sock = h2o_evloop_socket_create(ctx.loop, fd, H2O_SOCKET_FLAG_DONT_READ);
    h2o_socket_read_start(sock, on_accept);

This is the h2o specific bit, where we add the socket we created to the event loop, and tell h2o not to read from this socket but if it becomes readable to call our handler on_accept. This is what happens there:

static void on_accept(h2o_socket_t *listener, const char *err)
{
    h2o_socket_t *sock;

    if (err != NULL) {
        return;
    }

    if ((sock = h2o_evloop_socket_accept(listener)) == NULL)
        return;
    h2o_accept(&accept_ctx, sock);
}

This is effectively a filter that calls accept(2) on the listener socket (through h2o_evloop_socket_accept) and then tells h2o to accept this new socket as well.

Once all this is setup we can enter the h2o event loop:

    while (h2o_evloop_run(ctx.loop, INT32_MAX) == 0)
        ;

Here is the helloHandler function we hooked above:

static int helloHandler(h2o_handler_t* handler, h2o_req_t* req)
{
  if (!h2o_memis(req->method.base, req->method.len, H2O_STRLIT("GET")))
    return -1;
  
  req->res.status = 200;
  req->res.reason = "OK";
  h2o_add_header(&req->pool, &req->res.headers, H2O_TOKEN_CONTENT_TYPE, 
    NULL, H2O_STRLIT("text/plain"));    
  h2o_send_inline(req, H2O_STRLIT("Hello, world\n"));
  
  return 0;
}

Note that this uses h2o_memis to check it is getting a GET request, and if it isn't, the handler returns −1 to indicate it did not handle the request.

We then set status, reason, content-type, and use the helper function h2o_send_inline to send a response.

h2o_send_inline is a slower convenience function for sending chunks of memory to a client. Behind this function hide generators which are a more generic and faster way of sending data, without having to supply all data in one go.

Finally, please note that libh2o is event driven. If our helloHandler sleeps for a second, nothing else will happen during that second! Please read on for how to deal with slow data sources.

   

Object lifetime, memory management

As noted, memory management is hard. Within libh2o most management is automated for you. For example, when you get passed a request (h2o_req_t), it is not your responsibility to free it when you are done.

And in fact, you can use a request to allocate data associated with that request, and this will be deallocated if for whatever reason the request is done or goes away.

As an example, to allocate an stdio FILE pointer:

FILE** fpptr = h2o_mem_alloc_shared(req->pool, sizeof(FILE*), on_FILE_close);
*fpptr = fopen("/etc/motd", "r");
// do things with file

Note that we also passed a callback on_FILE_close and it looks like this:

static void on_FILE_close(void *ptr)
{
  FILE** fpptr = (FILE**) ptr;
  if(*fpptr)
    fclose(fpptr);
}

When used like this, we can be sure that memory for our file pointer is cleaned up, AND that the file pointer will get closed when the request is done or otherwise disposed of. Such automated memory and resource management is vital when using asynchronous operations.

   

Config, context, accept context

In this small sample, we've been passing around an H2O config, a context and an accept context.

NOTE: This section is quite speculative

   

Global configuration

The global configuration actually is not global, as in, it is possible to run several completely independent h2o instances within a single binary. It is however also possible to have a single instance that serves on several listening sockets and also offers multiple servers in there, and these then share the 'global' configuration.

The global configuration knows specifies things like HTTP/1 or HTTP/2 timeouts, if HTTP/1 connections should attempt an upgrade to HTTP/2, maximum size of POST data and many other settings. The defaults are good though.

   

Accept context

An accept context can be plaintext or a TLS context. One accept context can host multiple sockets, and sockets are registered to a context.

Note that hosts that are registered exist within all contexts, and thus also on all sockets.

   

Context

The context owns the event loop, to which multiple sockets can report, that in turn can report to different accept contexts.

It also keeps counters for HTTP/1 and HTTP/2 connections.

   

Generators

As noted above, h2o_send_inline is a convenience function for sending data in response to an HTTP request. If we want to send a fixed blob of data we already have in memory this is fine, but if we for example want to send the large output of a SQL query, which arrives in separate rows, this is not the best interface.

The 'long form' of sending data to a request is by setting a generator:

    static h2o_generator_t generator = {proceedSending, stopSending};
    h2o_start_response(req, &generator);

    h2o_iovec_t buf;
    
    // gather initial data from SQL query, store it in buf
    
    h2o_send(req, &buf, 1, H2O_SEND_STATE_IN_PROGRESS);

First we register two callbacks: proceedSending and stopSending. The last one is called when the HTTP session went away for any reason, and offers us the ability to close down the SQL session or perform any other cleanup.

Note: As elsewhere, the static is not optional. When we pass a pointer to the generator, libh2o makes no copy for us. Without static, our generator would soon be pointing to dangling memory. There is a reason for this behaviour which is explained below.

We then gather some initial data and send that using h2o_send. In this syntax we can pass an array of h2o_iovec_t buffers, but here we pass just 1. The final parameter H2O_SEND_STATE_IN_PROGRESS means that at a later stage we might have more data to offer.

Importantly, while h2o_send is working on the data, you must not touch or free the data you passed to be sent: libh2o made no copy for you. This in contrast to h2o_send_inline which as a convenience does so for you.

Once libh2o has sent out the initial data, it will call our proceedSending callback to ask for more. That is also the moment we can free any memory we allocated for h2o_send. If in proceedSending we find that we've hit the end of the data we want to serve, we emit a final call to h2o_send with as final parameter H2O_SEND_STATE_FINAL.

We are not guaranteed to get a call to stopSending, so any cleanup should be performed once we pass H2O_SEND_STATE_FINAL.

   

Serving files

A special case is serving static files from disk. This can of course be implemented using h2o_start_response, but h2o has built-in primitives to make this happen:

    h2o_pathconf_t* pathconf = h2o_config_register_path(hostconf, "/", 0);
    h2o_file_register(pathconf, "examples/doc_root", NULL, NULL, 0);

The third parameter of h2o_file_register specifies an array of files that could serve the index of that path. The fourth parameter is an optional map of MIME-types. The final parameter specifies flags. Flags available are:

    H2O_FILE_FLAG_NO_ETAG = 0x1,
    H2O_FILE_FLAG_DIR_LISTING = 0x2,
    H2O_FILE_FLAG_SEND_COMPRESSED = 0x4,
    H2O_FILE_FLAG_GUNZIP = 0x8

If it is known what files to serve exactly, h2o_file_register_file can be used.

   

Passing data to handlers and generators

In general, passing data around in libh2o is somewhat cumbersome, but we hope, at least very efficient. The general technique is that when we set a handler or a generator, we send a pointer to something that starts with a h2o_handler_t or h2o_generator_t respectively. If we have no need to pass additional data, that is in fact all that will be stored at that pointer.

But if we want to pass additional data, we do so by creating a larger struct, that starts with a h2o_handler_t (for a handler):

struct ParameterHandler
{
  h2o_handler_t handler;
  int parameter;
};
// .. set pathconf ..
ParameterHandler *handler = 
  (ParameterHandler*) h2o_create_handler(pathconf, sizeof (*handler));
handler->handler.on_req = ourHandler;
handler->parameter = 42;

Within ourHandler we then do the reverse and cast the h2o_handler_t pointer back to a ParameterHandler pointer:

int ourHandler(h2o_handler_t* handler, h2o_req_t* req)
{
  ParameterHandler* phandler = (ParameterHandler*)handler;
  // phandler->parameter will now be 42
  ...
}
   

To a generator

When we need to pass data to a generator, we use a very similar technique, except without h2o_create_handler's help:

struct State
{
  h2o_generator_t super;
  int parameter;
};
...

State* state = (State*)h2o_mem_alloc_shared(&req->pool, sizeof(State), on_dispose);
state->super = {proceedSending, stopSending}
state->parameter = 23;

...
void proceedSending(h2o_generator_t *self, h2o_req_t *req)
{
  State* state = (State*)self;
  // state->parameter is now 23
  ...
}

Note that we may need to pass an on_dispose callback if State had any buffers in need of freeing (which is very likely for a generator).

   

Logging

...

   

SSL / TLS

libh2o uses OpenSSL or picotls for HTTPS and HTTPS/2. There is significant interaction between the TLS and HTTP components. When using OpenSSL, we first need to initialize that library:

Question: Is picotls standalone? Or does it get its crypto operations from OpenSSL?

SSL_load_error_strings();
SSL_library_init();
OpenSSL_add_all_algorithms();

In libh2o, TLS is specific for an accept context as described above. Here is how we hook op OpenSSL to the accept context:

accept_ctx.ssl_ctx = SSL_CTX_new(SSLv23_server_method());
accept_ctx.expect_proxy_line = 0; // makes valgrind happy

Please note that the accept_ctx struct must be zeroed before use.

We can then continue to configure OpenSSL with options:

SSL_CTX_set_options(accept_ctx.ssl_ctx, SSL_OP_NO_SSLv2);
SSL_CTX_set_ecdh_auto(accept_ctx.ssl_ctx, 1);

/* load certificate and private key */
if(SSL_CTX_use_certificate_chain_file(accept_ctx.ssl_ctx, cert_file) != 1) {
  fprintf(stderr, "an error occurred while trying to load server certificate file:%s\n", cert_file);
  return -1;
}

if(SSL_CTX_use_PrivateKey_file(accept_ctx.ssl_ctx, key_file, SSL_FILETYPE_PEM) != 1) {
  fprintf(stderr, "an error occurred while trying to load private key file:%s\n", key_file);
  return -1;
}

char ciphers[]="DEFAULT:!MD5:!DSS:!DES:!RC4:!RC2:!SEED:!IDEA:!NULL:!ADH:!EXP:!SRP:!PSK";
if(SSL_CTX_set_cipher_list(accept_ctx.ssl_ctx, ciphers) != 1) {
  fprintf(stderr, "ciphers could not be set: %s\n", ciphers);
  return -1;
}

Finally, we can hook up HTTP/2 selection to ALPN, which allows a client to immediately select HTTP/2 through the TLS negotiation:

  h2o_ssl_register_alpn_protocols(accept_ctx.ssl_ctx, h2o_http2_alpn_protocols);

Note: there also is support for another selection protocol called NPN, but it appears to no longer be used by clients.

   

HTTP/2 specific features

Question: How much of the pushed data smarts are in libh2o? How many are in “h2o the server”. Is this CASPER?

   

Event loops (libuv, native)

Question: Is there any benefit to using libuv? When would you want to do that? ...

   

H2O as a client library

To facilitate proxying, H2O also supports outgoing HTTP connections. As with the rest of libh2o, this is an industrial strength solution so it requires a bit of work to setup.

h2o_httpclient_connect initiates a connection based on a parsed URL. This will also (asynchronously) resolve IP addresses for you. Once the connection in place, an on_connect callback is called. This has to perform various mechanics and then returns an on_head callback which is called when the headers have been received from the HTTP server.

on_head can then look at the headers and in turn returns the on_body callback. This then gets called whenever a chunk of body is ready for processing, which can then be found on the h2o_httpclient_t* parameter. The on_body callback can tell libh2o it has consumed from the buffer by calling h2o_buffer_consume.

Sample code can be found in http1client.c from the libh2o examples directory, where you can see the (verbose) mechanics we skipped over in the preceeding description.

   

H2O multithreading

...

   

Notes on asynchronous operations

Requests can be handled completely within a request handler when data is immediately available (in memory) to send to the client. Often this will not be the case.

libh2o offers multiple ways to deal with slow data sources, or with large chunks of data we do no want to keep in memory for a long time.

   

Stop and proceed

As noted above, we can send an initial amount of data with h2o_send but note that this is not yet everything. The proceed callback of the generator will then be called once h2o needs more data from us to send to the client. This callback should then immediately supply that data. For many data sources this is fast enough - for example, when reading from a file.

   

Asynchronous

At other times however we may be serving data sourced from a database or another service reached over the network in which case blocking on our proceed callback is not feasible.

The good news is that libh2o is completely asynchronous itself - it will not get confused if your handler does not immediately do anything with a request.

This means that in the request handler, or in the proceed callback, you can send out a request for more data to your backend, and return control to the libh2o event loop. When you do so however, you must make sure that event loop picks up when you have new data available!

If you are streaming data from another socket, this is as easy as adding that socket to the main event loop like this:

h2o_sock_t sock = h2o_evloop_socket_create(dsc->h2o_ctx.loop, 
  oursock, H2O_SOCKET_FLAG_DONT_READ);
// a pointer that tells you which request this belongs to  
sock->data = req; 

// this listens to responses from our socket to turn into http data
h2o_socket_read_start(sock, on_oursock);

When the on_oursock callback gets called, we know it will be able to read data from oursock & pass it on to h2o_send, being careful to mark that as H2O_SEND_STATE_IN_PROGRESS unless we got an end-of-data indicator.

HOWEVER: while waiting for data from the backend socket, the actual libh2o request may have disappeared! There may have been a network error, or the client disconnected, or a timeout happened. This will cause big problems in on_oursock which will then attempt to send data to a dead request. This is not a theoretical problem by the way, it will happen almost immediately under any kind of load.

There are two ways of detecting that a socket is gone. The first one is to set the stop callback when calling h2o_start_response. This will get called if the request times out. From that callback, it is then possible to disconnect oursock from the event loop or to close it or to otherwise set a marker that the on_oursock callback does not touch the request.

The downside of this method is that in the initial request handler, not everything may already be known to issue a response. For example, it is no longer possible to create a 4xx or 5xx error afterwards. And you may need to do that based on data you received from oursock.

A second way of being informed of a request teardown is by associating some memory with it using h2o_mem_alloc_shared as shown above. If a request gets disposed, the callback associated with this memory gets called, and it too can then take measures to prevent on_oursock from touching the now deleted request.

Some more context can be found on this GitHub issue.

formatted by Markdeep 1.03