Using CoralCDN as a server operator

(Note: Due to excessive spam, we have turned off public editing of this page. If you would like something added, please email us.)

There are a number of mechanisms by which you can incorporate Coral into your site to reduce your traffic load and decrease your bandwidth bills. Here, we describe some simple approaches for taking advantage of Coral.

  • Rewrite inline links, such as for images, to have Coralized absolute URLs.
  • Issue HTTP Redirects (302) from one of your normal URLs to the Coralized URL of that page. However, one needs to ensure that requests by Coral proxies themselves are not redirected, because how else would Coral fetch the page in the first place? We note that user-agent: and via: will have CoralWebPrx. If this step is not taken, a request may result in an infinite loop.

Please feel free to add the mechanisms and code by which you incorporate Coral as they arise.

Setting expiry times on content

If no expiry time is set for a Coral file, the default period is 12 hours. After content is expired, Coral proxies will make a conditional request back to origin websites. A conditional GET "requests that the entity be transferred only under the circumstances described by the conditional header field(s). The conditional GET method is intended to reduce unnecessary network usage by allowing cached entities to be refreshed without requiring multiple requests or transferring data already held by the client."

To specify some other expiry time other than 12 hours, you can set this period in your configuration files. For example, in Apache:

<IfModule mod_expires.c>
<LocationMatch "/coral/(stats|imgs/maps)/">
ExpiresActive on
ExpiresDefault "access plus 5 minutes"
</LocationMatch>
</IfModule>

Apache mod_rewrite

For those running Apache web servers, you can use mod_rewrite to issue HTTP redirects. The method for enabling mod_rewrite will vary depending on apache version and operating system, but for Apache 2, it is usually as simple as removing the # from the following line in the httpd.conf file.

#LoadModule rewrite_module modules/mod_rewrite.so

Once mod_rewrite is enabled, you will need to insert rewrite rules into your httpd.conf file. For example, if your server is running http://foo.bar, the following will redirect all files whose URL starts with http://foo.bar/images/foo to the Coralized version:

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx
RewriteCond %{QUERY_STRING} !(^|&)coral-no-serve$
RewriteRule ^/images/foo(.*)$ http://foo.bar.nyud.net:8080/images/foo$1 [R,L]

You can also make use of the HTTP_HOST variable:

RewriteRule ^/images/foo(.*)$ http://%{HTTP_HOST}.nyud.net:8080/images/foo$1 [R,L]

One upside of using rewrite links like this is that a server can still see each request in its webserver logs. Of course, the server will see receive an HTTP request per link; however, the upstream bandwidth will only be tiny HTTP redirects, not actual files.

Line 1 turns on the mod_rewrite engine.

Line 2 checks to see if the request for the file is from Coral - if it is it ignores it to prevent endless loops. In English it says "If the HTTP request does not come from CoralWebPrx".

Line 3 checks to see if the request is coming back to us because Coral has failed (if it does it adds ?coral-no-serve to the URL in the query-string, or appends &coral-no-serve to a pre-existing query-string). In that case, we don't want to Coralize the URL either. In English it says "If the URL does not end with coral-no-serve". For more information about the "coral-no-serve" line and how it applies to quota limits, please see this explanation.

Line 4 does the actual URL rewriting. If the previous two conditions are satisfied (the request is not from Coral itself, nor is it a failed Coral request), then do the rewrite. The rewriting rule uses regular expressions to define the rewrite. In this case it looks for a file with a name that begins with "foo" and is located in the "images" directory - if it finds one it returns the Coralized URL with the filename at the end. (Regular expressions are devilishly difficult. If you've never used grep or Perl regular expressions, try Mastering Regular Expressions or ask a local expert.)

Coralizing certain referrer sites

If you know certain sites (such as Slashdot) are likely to push high amounts of traffic your way, so you want to configure your server as proactively Coralizing any traffic from those sites, you can do this also via mod_rewrite. One set of such rules might include the following:

# Redirect users from certain pages to CoralCDN
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} !^CoralWebPrx
RewriteCond %{QUERY_STRING} !(^|&)coral-no-serve$
RewriteCond %{HTTP_COOKIE} heavyloaduser=true [OR]
RewriteCond %{HTTP_REFERER} slashdot\.org [NC]
RewriteCond %{HTTP_REFERER} digg\.com [NC,OR]
RewriteCond %{HTTP_REFERER} blogspot\.com [NC,OR]
RewriteCond %{HTTP_REFERER} reddit\.com [NC,OR]
RewriteCond %{HTTP_REFERER} stumbleupon\.com [NC,OR]
RewriteRule ^(.*)$ http://%{HTTP_HOST}.nyud.net%{REQUEST_URI} [R,L,CO=heavyloaduser:true:%{HTTP_HOST}]
</IfModule>

Nginx rewrite configuration

The following configuration for the Nginx webserver (akin to the above Apache mod_rewrite rules) were supplied by Sebastiaan Deckers.

# CoralCDN caching of the entire site
location / {
if ($http_user_agent ~ ^CoralWebPrx) {
break;
}
if ($query_string ~ (^|&)coral-no-serve$) {
break;
}
rewrite ^/(.*) $scheme://$host.nyud.net/$1? redirect;
}

Adjust the location match if you wish serve only specific files via CoralCDN. For example:

location = /pictures/my_photo.jpg {
# snip
}

To serve all static assets via CoralCDN:

location ~ \.(jpg|gif|png|css|js)$ {
# snip
}

Coralizing mirrors

Many sites are using Coral to provide access to larger files, such as video clips. For example, during December 2004, Coral served ~100 TB of tsunami videos, linked from a variety of blogs and other websites. However, we ask that people posting links to mirrors only provide ONE Coralized URL per file, not Coralized links to "many" mirrors.

Remember that Coral serves to automatically mirror your file throughout our distributed system based on load. However, the naming mechanism currently used is based on URLs. Thus, if the same file is provided under several URLs (such as http://mirrorA.nyud.net/file and http://mirrorB.nyud.net/file ), Coral will need to fetch and store copies of the file from both mirrors, wasting downstream bandwidth, using unnecessary cache space, and reducing client performance.

Coralizing frequently-updated files

Coral can be integrated in more advanced ways. For example, consider that you (1) have a set of frequently-updating files, although perhaps irregularly, (2) you wish to keep track of clicks in your web logs, and (3) you want a simple, long-lived public URL. You can still use Coral!

This page describes a solution to all these goals (as used here), that uses HTTP redirects to redirect users from a long-lived non-Coralized URL to short-lived Coralized URLs (Apache's mod_rewrite), while also using explicit cache-control using (Apache's mod_expires).