Cache Reverse Proxy - nginx

Introduction

On the homepage of nxing wiki, there used to be one sentence which really impressed my very much when I first time to take a look at it and learn it three years ago:

Apache is like Microsoft Word, it has a million options but you only need six. Nginx does those six things, and it does five of them 50 times faster than Apache.  --Chris Lea

Now it is not there anymore, how you can still search on the web to see it: example link.

One of the six things nginx can do and can do gorgeous is putting it in front of the web application server as a Cache Reverse Proxy, which does two major tasks:

  1. Receive all the requests and dispatch them to the backed server(s)
  2. Cache requests & response to speed up HTTP transaction significantly
  3. Play a role as a load balancer (optional).

What are the benefits of doing this? There are many, items below are copied from Wikepedia:

  • Reverse proxies can hide the existence and characteristics of the origin server(s).
  • Application firewall features can protect against common web-based attacks. Without a reverse proxy, removing malware or initiating takedowns, for example, can become difficult.
  • In the case of secure websites, the SSL encryption is sometimes not performed by the web server itself, but is instead offloaded to a reverse proxy that may be equipped with SSL accelerationhardware. (See SSL termination proxy)
  • A reverse proxy can distribute the load from incoming requests to several servers, with each server serving its own application area. In the case of reverse proxying in the neighborhood of web servers, the reverse proxy may have to rewrite the URL in each incoming request in order to match the relevant internal location of the requested resource.
  • A reverse proxy can reduce load on its origin servers by caching static content, as well as dynamic content. Proxy caches of this sort can often satisfy a considerable amount of website requests, greatly reducing the load on the origin server(s). Another term for this is web accelerator.
  • A reverse proxy can optimize content by compressing it in order to speed up loading times.
  • In a technique known as "spoon feeding",[2] a dynamically generated page can be produced all at once and served to the reverse-proxy, which can then return it to the client a little bit at a time. The program that generates the page is not forced to remain open and tying up server resources during the possibly extended time the client requires to complete the transfer.
  • Reverse proxies can be used whenever multiple web servers must be accessible via a single public IP address. The web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines and different local IP addresses altogether. The reverse proxy analyzes each incoming call and delivers it to the right server within the local area network.
  • Reverse proxies can be used to perform A/B testing and Multivariate testing without placing javascript tags or code into pages.

 

This post is focused on No 1 and 2 with dynamica content from backend web server, No 3 is simple enough, nginx provides a HTTP UPSTREAM module for doing it.

The scenario

Image there is a web resource which is frequenyly accessed, and is less frequently updated, but it does cares about

instantaneity, for example:

  1. A relatively static homepage with high concurrent requests.
  2. A wiki page shared within an enterprise.

For this kind of scenario, our idea scenario is:

Nginx can cache the resource, and it will refresh the cache as soon as the resource got updated, so that all the further requests will retrieve the latest version!

How to do it

I built up a Rails 3, it expose a resource: http://localhost:3000/doc, it is very simple, each time a GET request comes in, it read a JSON file stored on hard disk:

{
  "Counter": 1
}

And each time a POST request comes in, it update the Counter to plus one.

The routes.rb:

get 'doc' => 'doc#view'
post 'doc' => 'doc#edit'

The view and edit actions in controller:

before_filter do
   @json_path = Rails.root.join("public", "doc.json")
end

def view
   response.header["Cache-Control"] = "public"
   render :layout => false
end

def edit
   increased_counter = increase_counter
   response.header["Cache-Control"] = "no-cache"
   response.header["Expires"] = "-1"

   render :text => "Successfully Updated! The updated Counter is: #{increased_counter}"
end

def increase_counter
   counter = JSON.load(@json_path)["Counter"] + 1
   @json_file = File.open(@json_path, 'w')
   @json_file.write({ :Counter => counter }.to_json)
   @json_file.close
   counter
end

The simple view which renders JSON :

<%= JSON.load(@json_path).to_json %>
<form action="" method="post" accept-charset="utf-8">
  <input type="submit" name="Edit" id="" value="Edit" />
</form>


The rails server is bound to port 3000, and I am now going to config my nginx, my nginx.config is located at /usr/local/etc/nginx/nginx.conf:

http {
  proxy_cache_path  /var/www/cache levels=1:2 keys_zone=my-cache:8m max_size=1000m inactive=600m;
  proxy_temp_path /var/www/cache/tmp;

  server {
      listen       80;
      server_name  localhost;

      location / {
          # Reverse proxy
          proxy_pass        http://localhost:3000;
          proxy_redirect off;

          # Cache
          proxy_cache my-cache;
          proxy_cache_valid  200 302  60m;
          proxy_cache_valid  404      1m;
          proxy_cache_bypass   $http_secret_header;
          add_header X-Cache-Status $upstream_cache_status;

          proxy_set_header  Host            $host;
          proxy_set_header  X-Real-IP       $remote_addr;
          proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
      }
  }
}

The explanation is below:

  • proxy_cache_path: path for nginx to store cached files
  • proxy_temp_path: temp path when nginx is caching files
  • proxy_pass: the backend web server location (my Rails app in this example) 

Refer more information about the nginx proxy module config on: http://nginx.org/en/docs/http/ngx_http_proxy_module.html

Now if I restart my nginx server: sudo nginx -s reload, and visit http://localhost/doc, I can see the JSON with Counter equals to 1 rendered on the web page:

JSON Counter

And if we cd into /var/www/cache, we can see nginx has cached the request:
nginx cache

Important notice here: nginx will honor HTTP Cache-Control header, i.e. if its value is set to "no-cache" or "private", nginx will NOT cache the response!

Now I am going to update the Counter:
update counter

And when I visit http://localhost/doc again, no doubt, I will see Counter is "1" since I've let nginx to cache it.

Important notice here: after nginx cache the resource inbound with an HTTP verb, any further request with the same HTTP verb on this resource will be responded by nginx, backend server will never got hit!

Now comes to the key point, by default nginx will pass any HTTP POST request to the backend without looking up the cache, so I can "purge" the nginx cache for specific resource, so that next time when a new request comes in, nginx will rebuild the cache!

The tool for purging nginx cache is a CLI tool called nginx-cache-purge.

The usage is:

nginx-cache-purge "foobar.cs" /var/www/cache/nginx/baz

So that I can add a invoking line in my "edit" action inside my doc controller:

system("nginx-cache-purge 'doc' /var/www/cache")

Now every time we edit the Counter by HTTP POST, the Rails app will invoke CLI to purge the "/doc" cache, so that next request will lead nginx to re-cache it, we are all set!

Fantastic!

Actually nginx provide a standard HTTP API to purge the cache, that is proxy_cache_bypass module, it allows website administrators to define customized interface to bypass nginx cache look up, in my nginx.conf above there are two lines:

proxy_cache_bypass   $http_secret_header;
add_header X-Cache-Status $upstream_cache_status;
    

So if any client fires a HTTP request that contains header Secret-Header: 1, nginx will bypass the request regardless of whether the requested resource was cached or not.

Bypass cache

Conclusion

nginx is really a powerful and fast web server, and it is easy to use and config as well! Using nginx as a Cache Reverse Proxy can dramatically improve your website's loading speed and concurrency performance, thus improves UX and lower down the server cost. That's why it is used by so many popular websites.

References

https://github.com/FRiCKLE/ngx_cache_purge/ A nginx purge model

http://serverfault.com/questions/30705/how-to-set-up-nginx-as-a-caching-reverse-proxy

http://nginx.org/en/docs/http/ngx_http_proxy_module.html

http://wiki.nginx.org/HttpProxyModule

https://github.com/perusio/nginx-cache-purge 

Tags:

Categories:

Updated:

Leave a comment