Skip to content Skip to sidebar Skip to footer

Google App Engine - Caching Generated Html

I have written a Google App Engine application that programatically generates a bunch of HTML code that is really the same output for each user who logs into my system, and I know

Solution 1:

In order of speed:

  1. memcache
  2. cached HTML in data store
  3. full page generation

Your caching solution should take this into account. Essentially, I would probably recommend using memcache anyways. It will be faster than accessing the data store in most cases and when you're generating a large block of HTML, one of the main benefits of caching is that you potentially didn't have to incur the I/O penalty of accessing the data store. If you cache using the data store, you still have the I/O penalty. The difference between regenerating everything and pulling from cached html in the data store is likely to be fairly small unless you have a very complex page. It's probably better to get a bunch of very fast cache hits off memcache and do a full regenerate every once in a while than to make a call out to the data store every time. There's nothing stopping you from invalidating the cached HTML in memcache when you update, and if your traffic is high enough to warrant it, you can always do a multi-level caching system.

However, my main concern is that this is premature optimization. If you don't have the traffic yet, keep caching to a minimum. App Engine provides a set of really convenient performance analysis tools, and you should be using those to identify bottlenecks after you've got at least a few QPS of traffic.

Anytime you're doing performance optimization, measure first! A lot of performance "optimizations" turn out to either be slower than the original, exactly the same, or they have negative user experience characteristics (like stale data). Don't optimize until you're certain you have to.

Solution 2:

A while ago I wrote a series of blog posts about writing a blogging system on App Engine. You may find the post on static generation of HTML pages of particular interest.

Solution 3:

This is not a complete solution, but might offer some interesting option for caching.

Google Appengine Frontend Caching allows you a way of caching without using memcache.

Solution 4:

Just serve a static version of your site

It's actually a lot easier than you think.

If you already have a file that contains all of the urls for your site (ex urls.py), half the work is already done.

Here's the structure:

+-/website
+--/static
+---/html
+--/app/urls.py
+--/app/routes.py
+-/deploy.py

/html is where the static files will be served from. urls.py contains a list of all the urls for your site. routes.py (if you moved the routes out of main.py) will need to be modified so you can see the dynamically generated version locally but serve the static version in production. deploy.py is your one-stop static site generator.

How you layout your urls module depends. I personally use it as a one-stop-shop to fetch all the metadata for a page but YMMV.

Example:

main = [
  { 'uri':'about-us', 'url':'/', 'template':'about-us.html', 'title':'About Us' }
]

With all of the urls for the site in a structured format it makes crawling your own site easy as pie.

The route configuration is a little more complicated. I won't go into detail because there are just too many different ways this could be accomplished. The important piece is the code required to detect whether you're running on a development or production server.

Here it is:

# Detect whether this the 'Development' serverDEV = os.environ['SERVER_SOFTWARE'].startswith('Dev')

I prefer to put this in main.py and expose it globally because I use it to turn on/off other things like logging but, once again, YMMV.

Last, you need the crawler/compiler:

import os
import sys
import urllib2
from app.urls import main

port = '8080'
local_folder = os.getcwd() + os.sep + 'static' + os.sep + 'html' + os.sep
print'Outputting to: ' + local_folder

print'\nCompiling:'for page in main:
  http = urllib2.urlopen('http://localhost:' + port + page['url'])
  file_name = page['template']
  path = local_folder + file_name
  local_file = open(path, 'w')
  local_file.write(http.read())
  local_file.close()
  print' - ' + file_name + ' compiled successfully...'

This is really rudimentary stuff. I was actually stunned with how easy it was when I created it. This is literally the equivalent of opening your site page-by-page in the browser, saving as html, and copying that file into the /static/html folder.

The best part is, the /html folder works like any other static folder so it will automatically be cached and the cache expiration will be the same as all the rest of your static files.

Note: This handles a site where the pages are all served from the root folder level. If you need deeper nesting of folders it'll need a slight modification to handle that.

Solution 5:

Old thread, but i'll comment anyways as technology has progressed a little... Another idea that may or may not be approproate for you is to generate the HTML and store it on Google Cloud Storage. Then access the HTML via a CDN link that the cloud storage provides for you. No need to check memcache or wait for datastore to wake up on new requests. Ive started storing all my JavaScript, CSS, and other static content (images, downloads etc) like this for my appengine apps and its working well for me.

Post a Comment for "Google App Engine - Caching Generated Html"