DNS is one of those services you only notice when it breaks. In a serious homelab, I want more than “it works”:
- Network-wide filtering (ads/trackers/malware) without touching every device
- Split-horizon / authoritative zones for internal services
- Fast resolution under load (low latency + high QPS)
- Autonomy when upstreams or the WAN get flaky
- Security controls (encrypted upstreams + DNSSEC validation)
- Repeatability: the whole thing is deployed, validated, and re-deployed via Ansible
So I built a DNS Chain. Overengineered on purpose.
What this post matches#
This post reflects my current Ansible role and host layout:
dnsdistlistens on :53 and load-balances into a backend poolpiholeruns as N containers on 127.0.0.1:9991–999N- in my lab: N = (CPU cores − 1), which currently equals 7
bind9listens on 127.0.0.1:1053unboundlistens on 127.0.0.1:2054
Architecture#
High-level flow#
flowchart LR C["Clients (LAN/VPN)"] -->|"UDP/TCP 53"| D["dnsdist :53
LB + health checks + packet cache"] D --> P["Pi-hole pool xN
127.0.0.1:9991-999N
blocking + local cache"] P --> B["Bind9
127.0.0.1:1053
authoritative zones / split-horizon"] B --> U["Unbound
127.0.0.1:2054
recursive cache + DoT + DNSSEC validation*"] U -->|"TLS 853"| Up[("Cloudflare / Quad9 / Google")]
* DNSSEC validation is intended to happen in Unbound (more below). I also include a concrete test so I can prove it’s actually enabled.
Request flow (the big “back-and-forth” diagram)#
This is the single diagram I use when I’m debugging. If you can mentally simulate this flow, you can usually pinpoint where things went wrong in under a minute.
sequenceDiagram
participant C as Client
participant D as dnsdist :53
participant P as Pi-hole (pool)
participant B as Bind9 :1053
participant U as Unbound :2054
participant O as Upstream DoT :853
C->>D: Query A/AAAA
alt dnsdist packet-cache HIT
D-->>C: Answer (cache)
else MISS
D->>P: forward (LB + health check)
alt Blocked by Pi-hole policy
P-->>D: Blocking answer (NXDOMAIN/0.0.0.0)
D-->>C: blocked
else Allowed
P->>B: forward
alt Internal zone hit
B-->>P: authoritative answer
else External domain
B->>U: forward
U->>O: DoT + (DNSSEC validate)
O-->>U: response
U-->>B: response
B-->>P: response
end
P-->>D: response
D-->>C: response
end
endWhy this exact order#
1) dnsdist: a “front door” that stays fast under load#
dnsdist earns its place by doing three things well:
- Load balancing across multiple Pi-hole backends
- Health checks (unhealthy backends are automatically avoided)
- Packet cache for the hottest queries (answer without touching downstream layers)
This matters because it keeps client configuration simple: clients always use one DNS IP (this host) on port 53.
2) Pi-hole xN: filtering at the edge, scaled horizontally#
Pi-hole is a convenient “policy layer” for the whole network. I run multiple instances because it provides:
- Isolation: one container restart does not nuke the whole service
- Throughput headroom: load spreads across instances
- Operational flexibility: different lists/behavior can be tested on a subset (if desired)
Implementation detail: containers bind to distinct loopback ports (127.0.0.1:9991–999N), and dnsdist distributes traffic.
3) Bind9: authoritative split-horizon zones#
Bind9 is where my internal universe lives:
- authoritative zones (e.g.,
lab.internal) - internal records for services (
git.lab.internal,wiki.lab.internal, etc.) - optional split-horizon logic (internal view vs external)
If Bind9 can answer from an authoritative zone, it replies immediately. Otherwise, it forwards “the internet” further down the chain.
4) Unbound: recursive engine, cache, and upstream security#
Unbound is my “last hop” for external domains:
- large recursive cache (and aggressive performance tuning)
- DoT (DNS-over-TLS) to upstream providers
- resilience features like serve-expired (use cached records during upstream turbulence)
- a sensible place to enforce DNSSEC validation in one component
Caching strategy: layered on purpose#
Yes, there is caching at multiple layers. That is intentional.
- dnsdist packet cache: fastest possible responses for repeat queries
- Pi-hole cache: local caching close to the policy decision (block/allow)
- Bind9: instant answers for internal authoritative zones + cache for forwarded lookups
- Unbound: the heavy recursive cache + prefetch + serve-expired
The practical result: most “normal browsing” queries become very low latency once the caches are warm, and the system stays stable under bursts.
Autonomy: when upstreams fail#
Homelabs are where networking experiments happen: firewall restarts, VPN changes, routing updates, ISP hiccups.
Unbound can be tuned to keep things usable via serve-expired and prefetching. The goal isn’t perfection; it’s graceful degradation: internal services keep resolving, and external browsing is less likely to collapse immediately.
DNSSEC: security goal, concrete test#
If I say “DNSSEC”, I want it to be verifiable.
Where it should happen: in Unbound (single enforcement point).
How I prove it: I test a known-bad DNSSEC domain. If validation is on, the resolver should return SERVFAIL.
# Should resolve (often signed)
dig @127.0.0.1 -p 2054 cloudflare.com +dnssec
# Should SERVFAIL when DNSSEC validation is actually enabled
dig @127.0.0.1 -p 2054 dnssec-failed.org +dnssecIf this does not SERVFAIL, DNSSEC validation is not actually being enforced (or you are not testing the right resolver/port).
Security warning: do not become an open resolver#
Two rules I consider non-negotiable:
- Restrict who can query you. Enforce LAN/VPN-only access with firewall rules and/or dnsdist ACLs.
- Never expose this to the public internet. A publicly reachable recursive resolver will be abused.
I treat dnsdist ACLs and host firewall policy as part of “the design”, not an afterthought.
Performance tuning (kernel + service knobs)#
I tune the host because high-QPS DNS is mostly “fast UDP + lots of sockets”, and defaults are designed for general-purpose servers.
Example sysctl groups from my role:
- socket buffers (UDP/TCP)
- backlog limits
- TCP reuse/timeouts (important for TLS upstreams)
- name: tune sysctl for dns
ansible.posix.sysctl:
sysctl_file: /etc/sysctl.d/9999-ansible-dns.conf
name: "{{ item.name }}"
value: "{{ item.value }}"
sysctl_set: yes
loop:
- { name: "net.core.rmem_max", value: "4194304" }
- { name: "net.core.wmem_max", value: "4194304" }
- { name: "net.core.somaxconn", value: "65535" }
- { name: "net.ipv4.tcp_tw_reuse", value: "1" }On the service side:
- Bind9 runs with multiple worker threads
- Unbound scales
num-threadsby CPU - dnsdist is configured for caching and backend distribution
- Pi-hole instances are isolated and can be pinned with cpusets
Ansible as a contract: deploy and verify#
I do not trust a DNS deploy that does not validate itself.
My role runs sanity checks for the configs and then performs live resolution tests with retries. If resolution fails, the role fails immediately.
- name: Attempt DNS resolution
command: "dig @{{ resolution_host }} -p {{ resolution_port }} google.com +short"
register: result
until: result.rc == 0
retries: 5
delay: 10This turns “I think I deployed DNS” into “I can prove it works end-to-end”.
Why I keep it this way#
This chain gives me:
- Network-wide ad/track blocking across every device (including IoT)
- Internal naming that feels like a real environment (authoritative zones, split-horizon)
- Fast hot-path DNS with multiple caching layers
- Resilience when upstreams or WAN connectivity are not perfect
- A safe lab platform for experiments with resolvers, upstream providers, and policy
- Automation and reproducibility: rebuildable from scratch using Ansible
Yes, it’s overengineering. But it’s the kind that buys me what I actually care about: autonomy, security, and speed in a homelab environment.
Looking for a Senior DevOps or DevSecOps?
I help companies modernize their infrastructure, optimize Cloud/On-Premise costs, and build secure DevSecOps cultures.
