Cache Reverse Proxy - Varnish
Introduction
Varnish is an HTTP accelerator, the official page is: https://www.varnish-cache.org/.
Varnish stands in front of the web application server to be a cache reverse proxy (also can be a load balancer), it can simply cache all the static resources in memory, and can also be powerfully configed using VCL (Varnish Configuration Language, a DSL for Varnish configuration) to cache dynamic content, in addition, Varnish implements ESI (Edge Side Include) standards to provide ability to cache static parts of the page.
A web server uses Varnish can easily handle more than 10,000 requests/s on a single node, as a comparison, an Apache web server can handle about 1000 requests/s, thus it is extremely suitable and strongly recommended for "content-heavy dynamic websites" with high concurrency visits.
Varnish Software has made a vivid video to demonstrate what is Varnish and how it makes the web fly.
Installation
There are two ways of installing it, package manager or compiling from the source.
Install by package manager
For Linux distributions, follow the guide on Varnish Official Download page, for Mac, simply run brew install varnish
.
Compiling from source
Varnish relies on latest version GNU M4, autoconf, automake and libtool, so download them from GNU mirrors and compile/make them one by one:
For each ["M4", "autoconf", "automake", "libtool"] do:
curl -O http://mirrors.kernel.org/gnu/m4/m4-latest.tar.gz
cd m4-1.4.16/
./configure --prefix=/usr/local
make && sudo make install
Then download source from Varnish Official Download page, and install Varnish:
cd varnish-3.0.3
$ ./autogen.sh$ ./configure
make
sudo make install
For me, I miraculously forgot the existence of Homebrew and spent 1 hour in compiling from source...
Make it work!
I have a Rails 3 website running at http://localhost:3000, I can simply run:
sudo varnishd -a :80 -b http://localhost:3000 -s file,/tmp,500M -T localhost:6082
Arguments explanation (we can always run varnishd --help
):
- -a Binding address
- -b Backend server addr
- -s Backend storage specification
- -T Telnet addr(Management interface), e.g. -T localhost:6082
- -F Run in Foreground, see runtime log from the terminal
This will cache all GET/HEAD requests for the all the resources, we can simply setup a Varnish in front of our static file server, to benefit huge performance improve.
However, we need deal with two typical scenarios since our website is dynamic:
- Some resources should be cached, however they could be updated sometime, then the cache needs rebuilt. Simply run varnish in the above way will NOT achieve this!
- Most of my website functionalities require user logging in. If I simply run a varnish, then there is no security at all, all the sensitive data was cached by Varnish, this is not acceptable!
I've investigated how to config VCL to achieve No.1, for No.2, it can be done by ESI, I will cover that later.
VCL basics
When Varnish got installed, it generates a default.vcl under /etc/sysconfig/varnish
or /etc/default/varnish
for Linux distros and /usr/local/etc/varnish/
for Mac, all the content inside all commented out and let you modify.
VCL has a number of functions, each of them is invoked at specified stage of the HTTP transaction, the process is below (shamelessly stole from: MGM Tech blog:
There are several important built-in objects which can be accessed in functions:
req
The request object. When Varnish has received the request the req object is created and populated. Most of the work you do in vcl_recv you do on or with the req object.
beresp
The backend respons object. It contains the headers of the object comming from the backend. Most of the work you do in vcl_fetch you do on the beresp object.
obj
The cached object. Mostly a read only object that resides in memory. obj.ttl is writable, the rest is read only.
I have a resource exposed by Rails: http://localhost/doc
, I expect it can be cached by Varnish, and can refresh the cache when someone POST to update it; To achieve my goal, I need cache the resource for all GET/HEAD requests, however, when a update request comes in - POST request, Varnish should purge the object which has been cached, this is done by three steps:
- Set default backend in VCL:
backend default { .host = "127.0.0.1"; .port = "3000"; }
- Tell Varnish to purge cache when this is an HTTP POST request AND server said "no cache":
sub vcl_fetch { if (req.request == "POST" && beresp.http.Cache-Control == "no-cache") { ban("req.url == " + req.url); } return (deliver); }
- Rerun Varnish and tell it to use this VCL configuration.
sudo varnishd -a :80 -s file,/tmp,500M -T localhost:6082 -F -f /usr/local/etc/varnish/default.vcl
ban
is a new action added in Varnish 3.0, there used to bepurge
andpurge_url
actions before but how they are replaced byban
, purge now can only be used without arguments.
Finally I update "edit
" action of the resource controller:
Now if I update the resource:
HTTP/1.1 POST http://localhost/doc
Varnish will firstly passby the request to backend, and after it got updated inside Rails, the controller returns "Cache-Control: no-cache
" header, my VCL will then purge (ban) the requested url, so that next time a GET request comes in, Varnish will reload the resource from backend and the cached is re-built!
Below are some VCL examples I collected online:
Honor the Cache-Control header!
#Without this block Varnish would attempt to guess whether the response was cacheable and result in unexpected caching
if(obj.http.Pragma ~ "no-cache" ||
obj.http.Cache-Control ~ "no-cache" ||
obj.http.Cache-Control ~ "private") {
return(pass);
}
Force refresh
Always look up backend if a client fires a "Force refresh request", e.g. CMD-Shift-R in Mac or <Ctrl>+F5 in IE.
if (req.http.Cache-Control ~ "no-cache" && client.ip ~ editors) {
set req.hash_always_miss = true;
}
Remove cookie headers for images
sub vcl_fetch {
if (req.url ~ "\.(png|gif|jpg)$") {
unset beresp.http.set-cookie;
set beresp.ttl = 1h;
}
}
Pass sensitive data to the backend
For basic HTTP authentication:
if (req.http.Authorization) {
# Not cacheable by default #
return(pass);
}
For JavaEE web application:
sub vcl_recv {
if (req.http.cookie ~ "JSESSIONID") {
std.log("found jsessionid in request, passing to backend server"); # import std;
return (pass);
}
}
Tip for debug VCL: addstd
module in vcl file:import std;
So that we can print some useful log in VCL:
std.syslog(888, "Purge cache for: " + req.url);
Cache invalidation
The VCL below will setup a access control list named "purgers" and expose an HTTP PURGE interface:
acl purgers { "127.0.0.1"; }
sub vcl_recv {
if (req.request == "PURGE") {
if (!client.ip ~ purgers) {
error 405 "Method not allowed";
}
return (lookup);
}
}
sub vcl_fetch {
std.syslog(888, "vcl_fetch!!!!!!!!!!!!!!!");
if (req.request == "POST" && beresp.http.Cache-Control == "no-cache") {
std.syslog(888, "Purge cache for: " + req.url);
ban("req.url == " + req.url);
}
return (deliver);
}
sub vcl_hit {
if (req.request == "PURGE") {
purge;
error 200 "Purged";
}
}
sub vcl_miss {
if (req.request == "PURGE") {
purge;
error 200 "Purged";
}
}
sub vcl_pass {
if (req.request == "PURGE") {
error 502 "PURGE on a passed object";
}
}
So when an HTTP PURGE request: curl -X PURGE http://localhost/doc
is sent from "purgers", Varnish will purge the cache.
References
https://www.varnish-cache.org/trac/wiki/Introduction
https://www.varnish-cache.org/docs/3.0/tutorial/vcl.html
https://www.varnish-software.com/static/book/VCL_functions.html
http://blog.mgm-tp.com/2012/01/varnish-web-cache/
https://www.varnish-cache.org/trac/wiki/VCLExampleEnableForceRefresh
VAC 2.0.3 with high performance cache invalidation API (aka the Super Fast Purger)
Leave a comment