1 Hello, and welcome to libh2o
2 Basics
3 HTTP/2 notes
4 Data types & storage
5 Hello, world
6 Object lifetime, memory management
7 Config, context, accept context
7.1 Global configuration
7.2 Accept context
7.3 Context
8 Generators
9 Serving files
10 Passing data to handlers and generators
10.1 To a generator
11 Logging
12 SSL / TLS
13 HTTP/2 specific features
14 Event loops (libuv, native)
15 H2O as a client library
16 H2O multithreading
17 Notes on asynchronous operations
17.1 Stop and proceed
17.2 Asynchronous
NOTE: This is very early work & may have a lot of mistakes. If you find any, please let me know at bert.hubert@powerdns.com or @PowerDNS_Bert.
H2O is a powerful web server used and developed by Fastly. Its mission:
H2O is a new generation HTTP server that provides quicker response to users with less CPU utilization when compared to older generation of web servers. Designed from ground-up, the server takes full advantage of HTTP/2 features including prioritized content serving and server push, promising outstanding experience to the visitors of your web site.
H2O as a web server is very impressive and is seeing more and more use.
H2O is powered by libh2o
, which contains all the web serving power of H2O
itself. libh2o
is also packaged separately for third party use. It is MIT
licensed.
I have used more web server frameworks than is reasonable. libh2o
is by far
the most impressive effort I have ever seen, possibly because as far as I
know, it is the only web serving library that also powers a leading web
serving project.
The one downside of libh2o
is that its documentation is currently very
spread out. Parts live in the
h2o.h
include
file, other parts can be found in answers in the GitHub issue tracker, in
issues tagged with 'FAQ'. A great
(if slightly outdated) introduction can also be found in this
presentation.
Here, I hope to provide an introduction to libh2o that is correct, if not necessarily complete. Over time, the goal is to maintain correctness while coverage increases.
The H2O maintainers are incredibly friendly and helpful. With this document I hope to help them along in improving (lib)h2o.
If you want to help, this file can be edited on GitHub.
H2O comes with a full & leading implementation of HTTP/2. It strives to make HTTP/2 as fast as possible, to the point it can be a universal method of performing calls and transferring data.
Within H2O, HTTP requests are 'events' within connections. Events end up
with a handler
, and handler
s use generator
s to send content.
Of specific note, unlike most libraries, libh2o
can be used fully
asynchronously. This means that your code can receive an HTTP request & go
chew on it. Meanwhile, other HTTP requests come in and also get served. Once
your code is done with the earlier HTTP request, it can send out the
response.
This is specifically relevant since HTTP/2 is fundamentally asynchronous too: many requests may come in at the same time over a single connection. These all need to get serviced in parallel, and not sequentially.
By default libh2o
is protocol agnostic. Your code does not need to know
if it is dealing with HTTP/1 or HTTP/2, and out of the box both are
available. The behaviour of how HTTP/1 sessions might be upgraded to HTTP/2
and the interaction with TLS/SSL can of course be tweaked.
Specific HTTP/2 features such as server push are of course only available on HTTP/2 connections.
libh2o
is written in C. This makes the library very accessible, but also
means that H2O has to create some data structures from scratch. Here are a
few types you will encounter a lot:
H2O_STRLIT
: h2o
passes strings around as a pointer and a length. This
saves on repetitive strlen()
calls and also makes it possible to pass
arbitrary binary data. To pass a “classical C string”,
H2O_STRLIT("GET")
is used, which then expands to:
"GET", strlen("GET")
.
h2o_memis
which accepts two
pointer/length pairs. An example:
if(h2o_memis(req->method.base, req->method.len, H2O_STRLIT("GET")))
h2o_iovec_t
: a structure describing memory as above, a pointer and a
length field. Created by calling h2o_iovec_init(ptr, len)
.
h2o_vector_t
: A vector (dynamic array) of items.
Memory allocation is always difficult. To make it easier, libh2o
offers us
pools associated with HTTP requests and connections. Any data allocated to a
pool will get released automatically when the pool gets disposed.
The libh2o
“Hello World” example server is longer than you may be used to.
This is because h2o is a very powerful library and it needs to be generic
enough to cover a lot of use cases. Some simpler libraries may allow you to
be up and running in 12 lines but leave you hanging if you need to set a
socket option.
A basic libh2o
web server takes around 80 lines to deliver a fully
functional web server. What follows is a slightly reordered version of
simple.c which in turn is a slightly modified version of
simple.c
from the h2o
examples directory.
If you want to get going quickly, there is a Makefile. Simply clone the repository and run a Make.
First the basics:
static h2o_globalconf_t config;
static h2o_context_t ctx;
static h2o_accept_ctx_t accept_ctx;
These three contain global h2o
configuration, a context
for your
web server plus an accept context
for your listener thread. Note that the
static
is not accidental: be sure to zero these structs before using
them!
Then to get going:
h2o_config_init(&config);
h2o_hostconf_t* hostconf = h2o_config_register_host(&config, h2o_iovec_init(H2O_STRLIT("default")), 65535);
This initializes the global config to default values and it registers a host
called 'default'. Note that this accepts a h2o_iovec_t
as described above.
When registering the host we also pass a port, where 65535 means 'the
default port for the protocol used', or in other words '80' or '443'.
If only one host is registered, all of this is moot, since all queries, regardless of host name or port, then end up with the first host.
When registering multiple host names, a single registration is enough when using standard ports (80, 443). If non-standard ports are used, you must register a name twice, once for each port (say, 8000, 4430).
Next up we register a path
for "/hello”. We then create a handler for this
path, and we add a function helloHandler
to it:
h2o_pathconf_t *pathconf = h2o_config_register_path(hostconf, "/hello", 0);
h2o_handler_t *handler = h2o_create_handler(pathconf, sizeof(*handler));
handler->on_req = handler;
Note that paths are checked in order. This means that if the "/" path is registered first, no subsequent paths are ever going to match. So register paths from most specific to most general.
We then initialize the global ctx
with the global config
.
h2o_context_init(&ctx, h2o_evloop_create(), &config);
accept_ctx.ctx = &ctx;
accept_ctx.hosts = config.hosts;
Next up, listening and setting up the event loop:
struct sockaddr_in addr;
int fd, reuseaddr_flag = 1;
h2o_socket_t *sock;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_addr.s_addr = htonl(0x7f000001);
addr.sin_port = htons(12345);
if ((fd = socket(AF_INET, SOCK_STREAM, 0)) == -1 ||
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &reuseaddr_flag, sizeof(reuseaddr_flag)) != 0 ||
bind(fd, (struct sockaddr *)&addr, sizeof(addr)) != 0 || listen(fd, SOMAXCONN) != 0) {
return -1;
}
This is the usual longwinded C way of creating a socket, in this case, one that listens on 127.0.0.1:12345.
sock = h2o_evloop_socket_create(ctx.loop, fd, H2O_SOCKET_FLAG_DONT_READ);
h2o_socket_read_start(sock, on_accept);
This is the h2o
specific bit, where we add the socket we created to the
event loop, and tell h2o
not to read from this socket but if it becomes
readable to call our handler on_accept
. This is what happens there:
static void on_accept(h2o_socket_t *listener, const char *err)
{
h2o_socket_t *sock;
if (err != NULL) {
return;
}
if ((sock = h2o_evloop_socket_accept(listener)) == NULL)
return;
h2o_accept(&accept_ctx, sock);
}
This is effectively a filter that calls accept(2)
on the listener socket
(through h2o_evloop_socket_accept) and then tells h2o
to accept this new
socket as well.
Once all this is setup we can enter the h2o
event loop:
while (h2o_evloop_run(ctx.loop, INT32_MAX) == 0)
;
Here is the helloHandler
function we hooked above:
static int helloHandler(h2o_handler_t* handler, h2o_req_t* req)
{
if (!h2o_memis(req->method.base, req->method.len, H2O_STRLIT("GET")))
return -1;
req->res.status = 200;
req->res.reason = "OK";
h2o_add_header(&req->pool, &req->res.headers, H2O_TOKEN_CONTENT_TYPE,
NULL, H2O_STRLIT("text/plain"));
h2o_send_inline(req, H2O_STRLIT("Hello, world\n"));
return 0;
}
Note that this uses h2o_memis
to check it is getting a GET request, and if
it isn't, the handler returns −1 to indicate it did not handle the request.
We then set status, reason, content-type, and use the helper function
h2o_send_inline
to send a response.
h2o_send_inline
is a slower convenience function for sending chunks of
memory to a client. Behind this function hide generators
which are a more
generic and faster way of sending data, without having to supply all data in
one go.
Finally, please note that libh2o
is event driven. If our helloHandler
sleeps for a second, nothing else will happen during that second! Please
read on for how to deal with slow data sources.
As noted, memory management is hard. Within libh2o most management is
automated for you. For example, when you get passed a request (h2o_req_t
),
it is not your responsibility to free it when you are done.
And in fact, you can use a request to allocate data associated with that request, and this will be deallocated if for whatever reason the request is done or goes away.
As an example, to allocate an stdio FILE pointer:
FILE** fpptr = h2o_mem_alloc_shared(req->pool, sizeof(FILE*), on_FILE_close);
*fpptr = fopen("/etc/motd", "r");
// do things with file
Note that we also passed a callback on_FILE_close
and it looks like this:
static void on_FILE_close(void *ptr)
{
FILE** fpptr = (FILE**) ptr;
if(*fpptr)
fclose(fpptr);
}
When used like this, we can be sure that memory for our file pointer is cleaned up, AND that the file pointer will get closed when the request is done or otherwise disposed of. Such automated memory and resource management is vital when using asynchronous operations.
In this small sample, we've been passing around an H2O config, a context and an accept context.
NOTE: This section is quite speculative
The global configuration actually is not global, as in, it is possible to run several completely independent h2o instances within a single binary. It is however also possible to have a single instance that serves on several listening sockets and also offers multiple servers in there, and these then share the 'global' configuration.
The global configuration knows specifies things like HTTP/1 or HTTP/2 timeouts, if HTTP/1 connections should attempt an upgrade to HTTP/2, maximum size of POST data and many other settings. The defaults are good though.
An accept context can be plaintext or a TLS context. One accept context can host multiple sockets, and sockets are registered to a context.
Note that hosts that are registered exist within all contexts, and thus also on all sockets.
The context owns the event loop, to which multiple sockets can report, that in turn can report to different accept contexts.
It also keeps counters for HTTP/1 and HTTP/2 connections.
As noted above, h2o_send_inline
is a convenience function for sending data
in response to an HTTP request. If we want to send a fixed blob of data we
already have in memory this is fine, but if we for example want to send the
large output of a SQL query, which arrives in separate rows, this is not the
best interface.
The 'long form' of sending data to a request is by setting a generator
:
static h2o_generator_t generator = {proceedSending, stopSending};
h2o_start_response(req, &generator);
h2o_iovec_t buf;
// gather initial data from SQL query, store it in buf
h2o_send(req, &buf, 1, H2O_SEND_STATE_IN_PROGRESS);
First we register two callbacks: proceedSending
and stopSending
. The
last one is called when the HTTP session went away for any reason, and
offers us the ability to close down the SQL session or perform any other
cleanup.
Note: As elsewhere, the static
is not optional. When we pass a pointer
to the generator, libh2o
makes no copy for us. Without static
, our
generator would soon be pointing to dangling memory. There is a reason for
this behaviour which is explained below.
We then gather some initial data and send that using h2o_send
. In this
syntax we can pass an array of h2o_iovec_t
buffers, but here we pass just
1. The final parameter H2O_SEND_STATE_IN_PROGRESS
means that at a later
stage we might have more data to offer.
Importantly, while h2o_send
is working on the data, you must not touch or
free the data you passed to be sent: libh2o
made no copy for you. This in
contrast to h2o_send_inline
which as a convenience does so for you.
Once libh2o
has sent out the initial data, it will call our
proceedSending
callback to ask for more. That is also the moment we can
free any memory we allocated for h2o_send
. If in proceedSending
we find
that we've hit the end of the data we want to serve, we emit a final call to
h2o_send
with as final parameter H2O_SEND_STATE_FINAL
.
We are not guaranteed to get a call to stopSending
, so any cleanup should
be performed once we pass H2O_SEND_STATE_FINAL
.
A special case is serving static files from disk. This can of course be
implemented using h2o_start_response
, but h2o
has built-in primitives to
make this happen:
h2o_pathconf_t* pathconf = h2o_config_register_path(hostconf, "/", 0);
h2o_file_register(pathconf, "examples/doc_root", NULL, NULL, 0);
The third parameter of h2o_file_register
specifies an array of files that
could serve the index of that path. The fourth parameter is an optional map
of MIME-types. The final parameter specifies flags. Flags available are:
H2O_FILE_FLAG_NO_ETAG = 0x1,
H2O_FILE_FLAG_DIR_LISTING = 0x2,
H2O_FILE_FLAG_SEND_COMPRESSED = 0x4,
H2O_FILE_FLAG_GUNZIP = 0x8
If it is known what files to serve exactly, h2o_file_register_file
can be
used.
In general, passing data around in libh2o
is somewhat cumbersome, but we
hope, at least very efficient. The general technique is that when we set a
handler
or a generator
, we send a pointer to something that starts with
a h2o_handler_t
or h2o_generator_t
respectively. If we have no need to
pass additional data, that is in fact all that will be stored at that
pointer.
But if we want to pass additional data, we do so by creating a larger
struct, that starts with a h2o_handler_t
(for a handler):
struct ParameterHandler
{
h2o_handler_t handler;
int parameter;
};
// .. set pathconf ..
ParameterHandler *handler =
(ParameterHandler*) h2o_create_handler(pathconf, sizeof (*handler));
handler->handler.on_req = ourHandler;
handler->parameter = 42;
Within ourHandler we then do the reverse and cast the h2o_handler_t
pointer back to a ParameterHandler
pointer:
int ourHandler(h2o_handler_t* handler, h2o_req_t* req)
{
ParameterHandler* phandler = (ParameterHandler*)handler;
// phandler->parameter will now be 42
...
}
When we need to pass data to a generator, we use a very similar technique,
except without h2o_create_handler
's help:
struct State
{
h2o_generator_t super;
int parameter;
};
...
State* state = (State*)h2o_mem_alloc_shared(&req->pool, sizeof(State), on_dispose);
state->super = {proceedSending, stopSending}
state->parameter = 23;
...
void proceedSending(h2o_generator_t *self, h2o_req_t *req)
{
State* state = (State*)self;
// state->parameter is now 23
...
}
Note that we may need to pass an on_dispose
callback if State had any
buffers in need of freeing (which is very likely for a generator).
...
libh2o
uses OpenSSL or picotls for HTTPS and HTTPS/2. There is significant
interaction between the TLS and HTTP components. When using OpenSSL, we
first need to initialize that library:
Question: Is picotls standalone? Or does it get its crypto operations from OpenSSL?
SSL_load_error_strings();
SSL_library_init();
OpenSSL_add_all_algorithms();
In libh2o
, TLS is specific for an accept context as described above. Here
is how we hook op OpenSSL to the accept context:
accept_ctx.ssl_ctx = SSL_CTX_new(SSLv23_server_method());
accept_ctx.expect_proxy_line = 0; // makes valgrind happy
Please note that the accept_ctx
struct must be zeroed before use.
We can then continue to configure OpenSSL with options:
SSL_CTX_set_options(accept_ctx.ssl_ctx, SSL_OP_NO_SSLv2);
SSL_CTX_set_ecdh_auto(accept_ctx.ssl_ctx, 1);
/* load certificate and private key */
if(SSL_CTX_use_certificate_chain_file(accept_ctx.ssl_ctx, cert_file) != 1) {
fprintf(stderr, "an error occurred while trying to load server certificate file:%s\n", cert_file);
return -1;
}
if(SSL_CTX_use_PrivateKey_file(accept_ctx.ssl_ctx, key_file, SSL_FILETYPE_PEM) != 1) {
fprintf(stderr, "an error occurred while trying to load private key file:%s\n", key_file);
return -1;
}
char ciphers[]="DEFAULT:!MD5:!DSS:!DES:!RC4:!RC2:!SEED:!IDEA:!NULL:!ADH:!EXP:!SRP:!PSK";
if(SSL_CTX_set_cipher_list(accept_ctx.ssl_ctx, ciphers) != 1) {
fprintf(stderr, "ciphers could not be set: %s\n", ciphers);
return -1;
}
Finally, we can hook up HTTP/2 selection to ALPN, which allows a client to immediately select HTTP/2 through the TLS negotiation:
h2o_ssl_register_alpn_protocols(accept_ctx.ssl_ctx, h2o_http2_alpn_protocols);
Note: there also is support for another selection protocol called NPN, but it appears to no longer be used by clients.
Question: How much of the pushed data smarts are in libh2o? How many are in “h2o the server”. Is this CASPER?
Question: Is there any benefit to using libuv? When would you want to do that? ...
To facilitate proxying, H2O also supports outgoing HTTP connections. As with
the rest of libh2o
, this is an industrial strength solution so it requires
a bit of work to setup.
h2o_httpclient_connect
initiates a connection based on a parsed URL. This
will also (asynchronously) resolve IP addresses for you. Once the
connection in place, an on_connect
callback is called. This has to
perform various mechanics and then returns an on_head
callback which is
called when the headers have been received from the HTTP server.
on_head
can then look at the headers and in turn returns the on_body
callback. This then gets called whenever a chunk of body is ready for
processing, which can then be found on the h2o_httpclient_t
* parameter.
The on_body
callback can tell libh2o
it has consumed from the buffer
by calling h2o_buffer_consume
.
Sample code can be found in
http1client.c
from the libh2o
examples directory, where you can see the (verbose)
mechanics we skipped over in the preceeding description.
...
Requests can be handled completely within a request handler when data is immediately available (in memory) to send to the client. Often this will not be the case.
libh2o
offers multiple ways to deal with slow data sources, or with large
chunks of data we do no want to keep in memory for a long time.
As noted above, we can send an initial amount of data with h2o_send
but
note that this is not yet everything. The proceed
callback of the
generator will then be called once h2o
needs more data from us to send to
the client. This callback should then immediately supply that data. For many
data sources this is fast enough - for example, when reading from a file.
At other times however we may be serving data sourced from a database or
another service reached over the network in which case blocking on our
proceed
callback is not feasible.
The good news is that libh2o
is completely asynchronous itself - it will
not get confused if your handler does not immediately do anything with a
request.
This means that in the request handler, or in the proceed
callback, you
can send out a request for more data to your backend, and return control to
the libh2o
event loop. When you do so however, you must make sure that
event loop picks up when you have new data available!
If you are streaming data from another socket, this is as easy as adding that socket to the main event loop like this:
h2o_sock_t sock = h2o_evloop_socket_create(dsc->h2o_ctx.loop,
oursock, H2O_SOCKET_FLAG_DONT_READ);
// a pointer that tells you which request this belongs to
sock->data = req;
// this listens to responses from our socket to turn into http data
h2o_socket_read_start(sock, on_oursock);
When the on_oursock
callback gets called, we know it will be able to read
data from oursock
& pass it on to h2o_send
, being careful to mark that
as H2O_SEND_STATE_IN_PROGRESS
unless we got an end-of-data indicator.
HOWEVER: while waiting for data from the backend socket, the actual
libh2o
request may have disappeared! There may have been a network error,
or the client disconnected, or a timeout happened. This will cause big
problems in on_oursock
which will then attempt to send data to a dead
request. This is not a theoretical problem by the way, it will happen almost
immediately under any kind of load.
There are two ways of detecting that a socket is gone. The first one is to
set the stop
callback when calling h2o_start_response
. This will get
called if the request times out. From that callback, it is then possible to
disconnect oursock
from the event loop or to close it or to otherwise set
a marker that the on_oursock
callback does not touch the request.
The downside of this method is that in the initial request handler, not
everything may already be known to issue a response. For example, it is no
longer possible to create a 4xx or 5xx error afterwards. And you may need to
do that based on data you received from oursock
.
A second way of being informed of a request teardown is by associating some
memory with it using h2o_mem_alloc_shared
as shown above. If a request
gets disposed, the callback associated with this memory gets called, and it
too can then take measures to prevent on_oursock
from touching the now
deleted request.
Some more context can be found on this GitHub issue.