**dnsdist fundamentals** Introduction =============================================================================== This document outlines core `dnsdist` concepts, and illustrates these with common usecases. The goal is not to list each and every dnsdist feature. However, in this document you will find an explanation of all the parts of `dnsdist` and how they hang together. All `dnsdist` functionality is described in the [README](http://dnsdist.org/README/), and this file is authoritative and complete. The present document however may prove instrumental for making sense of the README. As of early 2017, `dnsdist` operates in front of large scale CDN nameservers and protects and load balances the DNS resolvers of some of the world's largest telecommunication service providers, supplying DNS services to at least 50 million internet subscribers. Basic functionality =================== `dnsdist` is a modern UNIX daemon which loads a configuration file (typically `dnsdist.conf`) at startup. It can run in the foreground with a console for real time querying and (re)configuration. It can also run as a regular daemon and be controlled remotely. `dnsdist` is configured and controlled via a Lua based environment, and optionally packets can pass through Lua for unlimited flexibility. Most configurations however run fully natively, which is required to support the 500kqps performance `dnsdist` can deliver. In it simplest form, dnsdist: 1. Receives DNS query packets from clients 2. Forwards these to a backend server 3. Forwards the response packet to the client While doing so, it gathers statistics on clients, queries and backend servers. A full and useful `dnsdist.conf` file is: ~~~~ setLocal("192.0.2.1") addLocal("2001:db8::2:1") addACL("192.0.2.0/24") addACL("2001:db8::/32") newServer("198.51.100.1") newServer("198.51.100.2") ~~~~ This file is equivalent to starting `dnsdist` as: ~~~~ # dnsdist -l 192.0.2.1 -l 2001:db8::2:1 -a 192.0.2.0/24 -a 2001:db8::/32 \ 198.51.100.1 198.52.100.2 ~~~~ In graphical form, this corresponds to the following configuration: ************************************************************************************* * +--------------+ +--------------+ * | DNS1 | | DNS2 | * | 198.51.100.1 | | 198.51.100.2 | * +--------------+ +--------------+ * ^ ^ * | | * v v * +-----------------+ * | dnsdist | * +-----------------+ * 2001::db8::2:1 192.0.2.2.1 * ^ ^ * | | * v v * -------------- ------------ * 2001::db8::/32 192.0.2.0/24 * IPv6 clients IPv4 clients ************************************************************************************* This forwards queries from the IPv4 and IPv6 ranges to two backend servers. If either DNS1 or DNS2 is down or slow, traffic will move to the other server seamlessly. By default a sensible server selection policy is used. Please note that in all dnsdist configurations, the software only forwards queries and relays responses. Queries are not retransmitted, and dnsdist does not perform any DNS resolving itself. However, if so configured, dnsdist will block, change, delay, modify a query, or respond to it directly itself. Note that backend servers should source their own data, either from local zones or the internet. `dnsdist` is not providing NAT service for backends. Some common configurations ========================== Before we delve into more theory, here are some frequently useful configuration examples. These are themselves educational on how `dnsdist` works, but please do read on to the later sections that explain how these examples really work. IPv6 proxy function ------------------- Useful for authoritative (legacy) servers unable to provide IPv6 service. ``` setLocal("2001:db8::2:1") -- our IPv6 public address newServer("198.51.100.1") -- the IPv4 backend setACL("::/0") -- whole world can talk to authoritative server -- Optionally, if dnsdist IP address is trusted by backend -- make sure whole world can't AXFR via dnsdist addAction(AndRule({OrRule({QTypeRule(dnsdist.AXFR), QTypeRule(dnsdist.IXFR)}), NotRule(makeRule("198.51.100.0/24"))}), RCodeAction(dnsdist.REFUSED)) ``` Basic failover and statistics ----------------------------- The following setup will balance traffic over three backends, export statistics to the [public PowerDNS metronome](https://metronome.powerdns.com/?server=All&beginTime=-7200), and hosts the webserver on http://127.0.0.8083/, providing access to any username with password 'supersecretpassword'. ``` webserver("127.0.0.1:8083", "supersecretpassword", "supersecretAPIkey") setLocal("2001:db8::2:1") -- our IPv6 public address newServer({address="198.51.100.1", name="recursor1"}) newServer({address="198.51.100.2", name="recursor2"}) newServer({address="198.51.100.3", name="recursor3"}) carbonServer("37.252.122.50", "ourhostname") ``` Replace 37.252.122.50 by the IP address of your own Metronome or Graphite server if you have one. Once enabled, try commands like `showResponseLatency()`, `topSlow()`, `topResponses(10, dnsdist.SERVFAIL)`, `topClients()` to learn about your users. DNSCrypt proxy -------------- First generate the key and certificate: ~~~~ generateDNSCryptProviderKeys("/path/to/providerPublic.key", "/path/to/providerPrivate.key") -- output: Provider fingerprint is: E1D7:2108:9A59:BF8D:F101:16FA:ED5E:EA6A:9F6C:C78F:7F91:AF6B:027E:62F4:69C3:B1AA generateDNSCryptCertificate("/path/to/providerPrivate.key", "/path/to/resolver.cert", "/path/to/resolver.key", 1, os.time(), os.time()+365*86400) ~~~~ This generates a certificate with a validity of one year from now. Next add to an existing `dnsdist.conf`: ``` addDNSCryptBind("127.0.0.1:8443", "2.providername", "resolver.cert", "resolver.key") ```` This functions just like a call to `addLocal()` would function, except now it offers DNSCrypt service. Simple rate limiting for a resolver ----------------------------------- When serving the internet needs of large numbers of users, some of them are bound to misbehave either wittingly or unwittingly. The following configuration limits the impact of many common streams of unwanted traffic: ``` setLocal("2001:db8::2:1") -- our IPv6 public address newServer({address="198.51.100.1", name="recursor1"}) newServer({address="198.51.100.2", name="recursor2"}) newServer({address="198.51.100.3", name="recursor3"}) -- For any IPv4 address and for any IPv6 /64, if traffic exceeds 10 QPS -- immediately set TC bit on response causing fallback to TCP/IP addAction(MaxQPSIPRule(10, 32, 64), TCAction()) addAnyTCRule() -- do the same for all ANY queries ``` When this is in place, use `showRules()` to see what your rules are protecting you from. Untangling a combined authoritative & resolver service ------------------------------------------------------ For historical reasons, single IP addresses have frequently ended up supplying both authoritative and resolving DNS service. This is widely seen as a bad idea. However, IP addresses end up staying hardcoded. This setup sends all queries with the Recursion Desired bit to a pool of resolvers, and all the rest to a pool of authoritative servers. In addition, a few names are hardcoded, which is sometimes needed to assure proper operation of CPE, settop boxes and modems. ``` setLocal("2001:db8::2:1") -- our IPv6 public address newServer({address="198.51.100.1", name="recursor1", pool="resolvers"}) newServer({address="198.51.100.2", name="recursor2", pool="resolvers"}) newServer({address="198.51.100.3", name="auth1"}) newServer({address="198.51.100.4", name="auth2"}) customerACLs={"198.51.100.0/24", "198.51.101.0/24", "2001:db8::/32"} addDomainSpoof("ntp.example.com", {"195.51.100.5", "2001::db8::2"}) addDomainSpoof("tftp.example.com", {"195.51.100.6", "2001::db8::3"}) -- only send customer originated recursion desired queries to our resolvers addAction(AndRule({makeRule(customerACLs), RDRule()}), PoolAction("resolvers")) ``` Speeding up an overloaded backend -------------------------------- Because `dnsdist` can focus on a single task, it can do it quite well. The `dnsdist` packetcache is therefore unreasonably fast and memory efficient. Using `dnsdist` as a cache in front of authoritative or recursive servers is additionally a straightforward way to get more CPUs to aid providing performance. This means that even if the `dnsdist` cache were not very good, it would still help provide more CPU cycles for DNS. ``` setLocal("2001:db8::2:1") -- our IPv6 public address newServer({address="198.51.100.1", name="recursor1"}) newServer({address="198.51.100.2", name="recursor2"}) customerACLs={"198.51.100.0/24", "198.51.101.0/24", "2001:db8::/32"} setACL(customerACLs) pc = newPacketCache(1000000, 86400, 0, 60, 60) getPool(""):setCache(pc) ``` This installs a 1 million entry cache with sensible TTL parameters. In case of dire overload and backend meltdown, add: ``` setStaleCacheEntriesTTL(3600) ``` This gives `dnsdist` permission to serve up data that has expired up to one hour ago should the backend not be providing new data. The console =========== If provided with the correct key, a `dnsdist` binary can connect (over TCP/IP) to a running `dnsdist` to inspect and modify the configuration. In addition, the statistics and ringbuffers can be queried. The console is available when `dnsdist` runs in the foreground. It can be launched as a TCP/IP service with `controlSocket("[::]:2000")`. Communications should be protected, and a key can be created using the `newKey()` command. This generates a configuration line that sets up the freshly configured key. To then connect to a running dnsdist, execute `dnsdist -c'. This will read the `controlSocket` address from `dnsdist.conf`. Lua --- The console is actually an interface to the Lua environment in `dnsdist`, which incidentally is also what the configuration file operates in. Even though the console and configuration file are Lua programs, `dnsdist` in production typically does not pass packets through Lua (even though Lua is a very high performance language). dnsdist with all hooks ====================== In a more dressed up configuration, dnsdist 1. Receives DNS query packets from clients 2. Inspects these packets and applies rules which optionally: * Delay a packet * Drop a packet * Modify a packet * Answer a packet 3. Forwards the packet to a backend server 4. Inspects the response packet from the backend server and applies rules which can optionally: * Drop a response * Delay a response 5. Sends the response packet to the client Optionally, step 3 can be skipped when dnsdist is configured to run with a packet cache. The packet cache assumes backends will send equivalent responses to equivalent queries, and if a query has been seen before, we send the client the same response (adjusted for case, DNS id and TTL countdown). The following diagram describes the route a packet takes through `dnsdist` and the backend server: ************************************************************************************* * * +-----------+ * | Server +-------> Response Rules * +-----------+ | * ^ | * | | * Server Selection | * ^ v * | +------------+ * +-----------+ +--------------> | Pool Cache | * |Maintenance| | +------------+ -----> Carbon stats * +---------+-+ Pool Selection | * ^ | ^ | -----> SNMP * | | | | * v '--+--> Rules (dyn, static) | -----> Protobuf * +-------+ | ^ Cache Hit * | Ring | <---------+ Response Rule <----> http API * |Buffers| | | | * +-------+ | ACL | -----> Status page * | ^ | * | | | <----> Console * | listen-socket | * | ^ | * v | | * Kernel eBPF rules | * ^ | * | | * +----------+ | * | Client + <--------------+ * +----------+ * ************************************************************************************* Pools and backend servers ========================= Incoming queries that need to be sent to a backend are assigned to a *pool*. Within such a pool reside one or more backend servers, each with its own attributes and performance metrics. All servers within a pool should provide the same kind of service, or in other words, `dnsdist` should be free to send a query to any server and get a useful answer. If backends are fundamentally different, they should get their own pool. Note that a backend can be in multiple pools. `dnsdist` supports multiple policies for picking a server within a pool. The default algorithm, `leastOutstanding` is usually fine. In the common case, this selects the server with the lowest number of outstanding queries. Cache ----- `dnsdist` can attach a cache to a pool. This cache assumes that if a query ends up in the pool a second time, it can be served the same response (as long as the TTL has not expired). The `dnsdist` cache is unreasonably fast and memory efficient and typically provides a performance benefit against most backends. The cache can optionally serve expired data, which is great in case backends are overloaded, for example by denial of service traffic. Rules & Actions =============== In various places (on receipt of a query, on receipt of a backend response, on a cache hit), Rules are run in order, and they can lead to Actions. Rules can hit on many attributes of a query or a response: * Source / Destination * Query name * Query type / Opcode * Flags * Query rate, query rate per subnet, per domain Actions in turn have been listed above: drop, delay, modify, and answer a query. Incoming rules & actions ------------------------ For incoming rules, the basic syntax is `addAction(DNS Rule, DNS Action)`. Various syntactic sugar rules are available for common Rule/Action combinations: ~~~~ addDomainSpoof("download.example.com", {"192.0.2.2", "2001:db8::3:1"}) -- hardcodes A and AAAA records for specified (sub)domain addDomainBlock("example.com") -- will block example.com, www.example.com etc ~~~~ These two rules are identical to: ~~~~ addAction(makeRule("download.example.com"), SpoofAction({"192.0.2.2", "2001:db8::3:1"})) addAction(makeRule("example.com"), DropAction()) ~~~~ Which in turn as shorthand is identical to: ~~~~ addAction("download.example.com", SpoofAction({"192.0.2.2", "2001:db8::3:1"})) addAction("example.com", DropAction()) ~~~~ Besides rules, `addAction` accepts as its first parameter: * A string that is either a domain name or netmask * A list of strings that are either domain names or netmasks Generated rules can be inspected with `showRules()`. Rules are evaluated in order. Some actions stop processing of further rules (like `DropAction()`), but others merely happen (like `LogAction()`). Rules can be removed with `clearRules()`, `rmRule(number)`, `mvRule(from,to)`. As convenience, `topRule()` takes the last rule and makes it the first one. Response rules & actions ------------------------ Response rules act on answers received from backend servers. List these rules with `showResponseRules()`. They are added using `addResponseAction()` and modified using `rmResponseRule()`, `mvResponseRule()`, analogous to incoming rules. Incoming Actions can not be applied to Response rules, their Response variants must be used. As an example: ~~~~ addResponseAction(RCodeRule(dnsdist.NXDOMAIN), DelayResponseAction(1000)) ~~~~ This delays NXDOMAIN responses by 1 second. Cachehit rules & actions ------------------------ Optionally, `dnsdist` can cache answers (per pool), providing a very attractive speedup against most backends. Such cache hit responses do not pass through the regular Response rules, and they might not need to since they incur very little processing overhead. An example: ~~~~ addCacheHitResponseAction(RCodeRule(dnsdist.NXDOMAIN), DelayResponseAction(1000)) ~~~~ The same rules that are applied to regular responses can be added using `addCacheHitResponseAction()`. Such rules can again then be manipulated using `rmCacheHitResponseRule()`, `mvCacheHitResponseRule()`, analogous to incoming rules. Ring buffers ============ Queries and responses get added to their own ring buffers, which contain details of the last *n* packets. The ring buffers can be queried to gather interesting statistics on traffic currently passing by. Interesting commands querying the ring buffers are: * `grepq()`: search both ring buffers for traffic for certain domains, from certain netmasks or response times in excess of *n* msec. * `topQueries()`, `topResponses()`, `topSlow()`: report on queries and responses * `topBandwidth()`, `topClients()`: report on clients * `showResponseLatency()`: response latency histogram Kernel based eBPF filtering =========================== On Linux, `dnsdist` can export certain filtering rules to the kernel. This allows for linerate gigabit/s dropping of traffic based on source netmask or query name. If configured via the `maintenance()` callback, this can provide for powerful automated denial of service protection. eBPF filters can be attached to some or all IP addresses `dnsdist` listens on. Sample use, which attaches to all IP addresses `dnsdist` is bound to: ~~~~ bpf = newBPFFilter(1024, 1024, 1024) bpf:attachToAllBinds() bpf:block(newCA("2001:DB8::42")) bpf:blockQName(newDNSName("evildomain.com"), 255) bpf:getStats() ~~~~ `dnsdist` also supports adding dynamic, expiring blocks to a BPF filter. Maintenance =========== Periodically, `dnsdist` calls the `maintenance()` function if defined in the configuration. This function can query the ringbuffer and instigate dynamic blocks to drop traffic, or instruct the kernel to do so. The basic tools for `maintenance()` are `exceedServFails()`, `exceedNXDOMAINs()`, `exceedRespByteRate()`, `exceedQRate()` and `exceedQTypeRate()`. These functions inspect the ringbuffers and return a list of IP addresses exceeding the threshholds. Such a list can be supplied to `addDynBlocks()` or `addBPFFilterDynBlocks()` (the kernel based variant). A sample minimal `maintenance()` for the latter case is: ~~~~ bpf = newBPFFilter(1024, 1024, 1024) setDefaultBPFFilter(bpf) dbpf = newDynBPFFilter(bpf) function maintenance() addBPFFilterDynBlocks(exceedQRate(20, 10), dbpf, 60) dbpf:purgeExpired() end ~~~~ Statistics ========== Statistics can be queried using `dumpStats()` from the console. In addition, statistics can be pushed to a server supporting the Carbon protocol, like Graphite or our own Metronome. Metronome is configured out of the box to also display per pool and per backend server statistics. `showResponseLatency()` will provide a latency histogram on the console. Finally, statistics are also available via the http API. The webserver ============= The webserver provides an attractive display of ongoing rules, performance, CPU load, backend servers, observed attacks. It can be launched with: `webserver("127.0.0.1:8083", "webpassword", "APIkey")` Note that visiting the live graphing status page imposes a load on `dnsdist` when configured with large ringbuffers. The API ======= Like the PowerDNS Authoritative Server and the PowerDNS Recursor, `dnsdist` offers a RESTful API to query statistics and perhaps one day also to make changes. Sample paths include /api/v1/servers/localhost/config, /jsonstat?command=stats, /jsonstat?command=dynblockost, /jsonstat. Accessing the API requires the use of the second password (APIkey) passed to `webserver()`. Other notable features ====================== Some other notable features are listed below. Protocol Buffers / DNSTAP logging --------------------------------- `dnsdist` can log queries over TCP/IP using a simple Google Protocol Buffers streaming format. In the near future, native DNSTAP output will also be available. Use `RemoteLogAction` and `RemoteLogResponseAction`. SNMP ---- `dnsdist` can send SNMP traps and export its statistics via a MIB.