Adding reCAPTCHA support to Django 1.0 Comments

October 16, 2008 by Tor Brede Vekterli

Django comes with a very flexible built-in framework for handling comments bound to arbitrary model objects, complete with automatic generation and handling of timestamps and a honeypot field designed to stop spambots. However, these measures are often easy to circumvent by spammers, prompting the use of other approaches such as serverside spam filtering services like akismet, requiring a moderator to approve all comments before they are publicly shown, or, more commonly through the use of a captcha.

An increasingly popular captcha solution, both due to its effectiveness and the fact that it is free to use, is reCAPTCHA, which combines combating spam with digitizing old books. Essentially, everybody wins! All you have to do is register for a private/public key pair at their site (used to ensure spammers cannot fake the data), and you're good to go. That is, after you've added support for it in your application, of course (which incidentally is what this is about).

This post gives a hopefully straight forward overview of how I went about adding support for reCAPTCHA to this blog, using Django 1.0 and its contrib.comments framework as a base. I have tried to be as unintrusive as possible, adding no more code (or ugly hacks) than I have deemed necessary.

Update Dec. 10 2008: Rewrote implementation completely, as the old, signal-based one failed to properly handle user feedback for invalid captchas.

Prerequisites

If you do not already have one, the first step is to register at reCAPTCHA.net in order to get your public/private keypair. Open up your project's settings.py and add the following lines to it somewhere:

RECAPTCHA_PUBLIC_KEY = 'public_key_from_recaptcha'
RECAPTCHA_PRIVATE_KEY = 'private_key_from_recaptcha'

Obviously, the actual values should be replaced with those of your keypair.

Next, download the python reCAPTCHA client and extract recaptcha/client/captcha.py to your application's main directory (the one in which you can find models.py et al). We do not really need the other files in the client package, so we'll leave those alone.

Note: for this article, the application package directory used is called main (i.e. models.py will be main.models, views.py will be main.views etc) Change this to reflect your own application. It is also assumed that you have gone through the process of adding support for the comments framework to your application already, i.e. added django.contrib.comments to settings.py, added the appropriate URL-include to urls.py and synchronized the database.

Form subclassing and captcha verification

There are certain challenges facing us when wanting to use reCAPTCHA with the comments framework. One of these is that reCAPTCHA requires the client's IP address, which is usually only available in the HTTP request object's META map. This object is, however, not given to any forms by the post_comment function in django.contrib.comments.views. It is a django policy to not mix request objects in with form objects, as it increases coupling, a position that is highly understandable. However, we need that darn IP address and we need it now!

A potential solution to this is to use signals instead, which get both the to-be-inserted comment object as well as the request-object to play with, and also allows full decoupling between the framework code and our own code. Nice and elegant. This was the solution that was covered in this article originally, but as was brought to my attention it didn't handle captcha verification errors gracefully at all. The only available action when a signal handler returns a non-successful result is to display a HTTP 403 error page, something that is hardly favorable. It will stop the spammers, but it might also stop some of your legitimate users (If the HTTP error is not a concern, or a desired trait, see the section on the signal-based version).

A more traditional approach using form subclassing seems to be way to go at the current point in time, as this will allow us to use the post_comment function unchanged and still get nice error messages when fields are invalid. But first of all, how should we deal with the mystery of the missing IP? Not very prettily, to be honest. For my purposes on this site, which includes a forced preview, I haven't found any elegant ways to pass the IP to the form object. As a hamfisted solution of sorts, I decided to simply create a wrapper around post_comment which creates a copy of the request object and adds a custom field to its POST map (as POST is passed to form objects). If anyone has a better/cleaner solution for this, please leave a comment!

main/comments.py:

import copy
from main import captcha
from django.conf import settings
from django import http
from django.contrib import comments
from django.contrib.comments.forms import CommentForm
from django.contrib.comments.views.comments import post_comment

# ugly, ugly hack
def wrapped_post_comment(request, next=None):
	request_copy = copy.copy(request)
	request_copy.POST = request.POST.copy() # create a mutable copy
	if '__recaptcha_ip' in request.POST:
		return http.HttpResponseBadRequest()
	request_copy.POST['__recaptcha_ip'] = request.META['REMOTE_ADDR']
	return post_comment(request_copy, next)

IP address note: If you're seeing comments being saved with an IP of 127.0.0.1 (i.e. localhost), and your django instance is sitting behind a reverse proxy (which is the case on Webfaction et al), try adding django.middleware.http. SetRemoteAddrFromForwardedFor to your middleware list to have request.META['REMOTE_ADDR'] automatically be rewritten to the address that the request is being forwarded for (as the documentation mentions, only use this if you know for a fact that there is a trusted reverse proxy in place).

Next up is the form itself. For convenience, we subclass the existing CommentForm class but overload the clean-method to take into account the passed captcha information. The captcha is only checked when the comment is actually submitted for posting, not for previewing. By setting a value in the self.errors map, we prevent the comments framework from saving the comment and rather make it redisplay the preview page with an error message of our choice. Click "view plain" to see the proper indentation:

main/comments.py (continued):

class ReCaptchaCommentForm(CommentForm):
	def __init__(self, target_object, data=None, initial=None):
		super(ReCaptchaCommentForm, self).__init__(target_object, data, initial)
		
	def clean(self):
		# If the form isn't being previewed, check the captcha
		if 'preview' not in self.data:
			challenge_field = self.data.get('recaptcha_challenge_field')
			response_field = self.data.get('recaptcha_response_field')
			client = self.data.get('__recaptcha_ip') # always set by our code
				
			check_captcha = captcha.submit(challenge_field, response_field,
				settings.RECAPTCHA_PRIVATE_KEY, client)
				
			if check_captcha.is_valid is False:
				self.errors['recaptcha'] = 'Invalid captcha value'

		return self.cleaned_data

There are no additional form fields defined, so we have to explicitly test for, and show, the error in the preview/form template, since it won't be done automatically (we'll get to that shortly—note that since I only show captchas on the forced preview page, I will usually just refer to the preview template and not the form template. What you do here is fully up to you). Here we also see the rather non-savory IP-value hack in action.

Now that we have our wrapped post_comment function and our form subclass, it's time to make the comments framework actually use them rather than their own versions. Since Python is a fully dynamic language, we can "monkeytype" or way around things, overriding as we see fit (another example of this for comments in django can be found at prettyprinted.net). The comments framework exposes a get_form() function that we override first:

main/comments.py (continued):

def recaptcha_get_form():
	return ReCaptchaCommentForm
	
comments.get_form = recaptcha_get_form

Next up, we simply shove the following lines in at the end of models.py as it is included early, something that should trigger all the fancy action in comments.py (if I understand Python correctly):

main/models.py:

from django.contrib.comments.views import comments
from main.comments import wrapped_post_comment

comments.post_comment = wrapped_post_comment

Templates and tags, oh my

Now most of the backend code is in place, but there's nothing that actually renders the reCAPTCHA code to the templates yet. That will be our next task. To be as generic as possible, adding a custom template tag is a good choice for this, as it can easily be reused (a form field with a custom render-method could also have been created instead, if desired. This only shows one possibility). If none exists already, create a templatetags subdirectory in the same directory as models.py et al. and create a file called recaptcha.py with the following contents:

from django import template
from django.conf import settings
from main import captcha

register = template.Library()

@register.simple_tag
def recaptcha_html():
	return captcha.displayhtml(settings.RECAPTCHA_PUBLIC_KEY)

We can now use the {% load recaptcha %} command in our templates to bring the {% recaptcha_html %} tag into scope, which will render the appropriate HTML.

Again, for this blog, I chose to not render the captcha on every post comment form, as I don't want to hit the reCAPTCHA servers on every single pageview. Rather, I force a preview and show the captcha there instead. This is very straight forward: simply copy <your django directory>/contrib/comments/templates/comments/preview.html into a comments-subdirectory of your own application's template directory and modify it to your needs. Forcing a preview also has the added bonus of thwarting most spambots by default, as these tend to just do a single HTTP POST to whatever's specified as the action in the comments form.

For displaying the captcha just below the comment preview, the following suffices:

templates/comments/preview.html:

(...snip...)
{% if form.errors %}
<h1>Oops, please correct the error{{ form.errors|pluralize }} below</h1>
    {% else %}
<h1>Preview your comment</h1>
<blockquote>{{ comment|linebreaks }}</blockquote>
    {% endif %}
      {% if not form.errors or form.errors.recaptcha %}{% recaptcha_html %}{% endif %}
      {% if form.errors.recaptcha %}<div class="error">{{ form.errors.recaptcha }}</div>{% endif %}
    {% if not form.errors or form.errors.recaptcha %}
<div>
        and <input type="submit" name="post" value="Post your comment" id="submit"> or make changes:
</div>
    {% endif %}
    {% for field in form %}
(...snip...)

Here the captcha is displayed only if the comment is ready to be posted (i.e. it doesn't show it when the only option is to preview), and we also explicitly test for, and print, its error message if present.

The same template inclusion principle applies no matter where or when you want the captcha to show up. You just have to make sure the tag is kept within the comments-form for the values to be passed to our code. The reCAPTCHA widget wraps itself in a <div>, so no need to do that explicitly.

Wrap-up

I hope that this information has been of help, or at least has been a mildly interesting read. I'm no expert at neither Python nor Django, so if I have made any grave mistakes here, or if something could be done more elegantly, why not, well, leave a comment?


Alternative approach (with a caveat!)

As mentioned, it is possible to do the captcha checking unintrusively using signals. However, also as mentioned, this will cause any invalid captchas to show as HTTP 403-errors at the end user! You have been warned. I'm still including this old code here in case someone finds it useful, and in the case the django comments framework gets support for different ways of dealing with signal handler errors.

The prerequisites are the same as with the form subclassing. Next, I simply created a file called comments.py and put it in the main-package (based on pre-1.0 code found on djangosnippets and nikolajbaer.us):

from main import captcha
from django.conf import settings

def verify_captcha(sender, comment, request, **kwargs):
	challenge_field = request.POST.get('recaptcha_challenge_field')
	response_field = request.POST.get('recaptcha_response_field')
	client = request.META['REMOTE_ADDR']
	
	check_captcha = captcha.submit(challenge_field, response_field,
		settings.RECAPTCHA_PRIVATE_KEY, client)
		
	if check_captcha.is_valid is False:
		return False
		
	return True

This covers all the actual validation work that we have to do here.

We get signal

Since we want to stop spammers in their tracks, we want a comment to never hit the database if it doesn't come with a valid captcha value. We also don't want to modify the comments framework code itself.

The comments framework lets us register a signal slot that is invoked after a comment has been cleaned and validated but before it is actually saved to the database, at which point we can simply return False to cancel the whole operation. From the signal's documentation, we see that we get passed the comment instance and the HTTP request context, which is all we need, as reCAPTCHA passes us two values in HTTP POST that we will send to their servers for validation. The client's IP address is also included in the request, so they can send out assassination teams to spammer hideouts (I presume).

The django docs recommend putting the code that connects the signal to its handler somewhere that reliably gets called early, like models.py, so that's where I put it:

from django.contrib.comments.signals import comment_will_be_posted
from main.comments import verify_captcha

comment_will_be_posted.connect(verify_captcha)

For templating and rendering, the exact same principles apply as with the form subclassing, only here you won't get any error-feedback for the captcha in the form.

Wrap up, for real this time

It's unfortunate that the old signal-based article was up for such a long time without me realizing that it didn't work quite as expected. I had honestly figured adding proper error handling to it would be possible. My sincere apologies if it caused anyone any headaches, side-aches, hairloss or other assorted conditions.

Comments

Tor Brede Vekterli on December 10, 2008

Testing that the form subclassing, templates et al work properly on the live server

Mp3 file on July 7, 2010

Thanks for the given information. your site is awesome and helpful. I'll come across it next time

4shared mp3 downloads on July 14, 2010

very details instruction. I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work!

Tor Brede Vekterli on July 14, 2010

I'm leaving the spam comments in (with removed URL) since it's such delightful irony to have them here given the subject matter :)

Perhaps kitten-auth is the future?

tonytaylor on August 8, 2011

Worked like a charm! The only thing that I'd like to note is that 'django.middleware.http. SetRemoteAddrFromForwardedFor' has now been deprecated, but request.META['REMOTE_ADDR'] has simply changed to request.META.get('REMOTE_ADDR').

Thanks so much!

visaxinum on April 5, 2017

I just like the helpful info you provide for your
articles. I will bookmark your weblog and take a look at again right here frequently.
I'm slightly certain I'll learn a lot of new
stuff right here! Best of luck for the next!

BHW on April 8, 2017

What's up, its good paragraph concerning media print, we all understand media
is a enormous source of data.

You really make it seem so easy with your presentation but I find this matter
to be really something that I think I would never understand.
It seems too complicated and very broad for me.

I'm looking forward for your next post, I'll try to get the hang
of it!

http://czymkarmickota.yeta.pl/ on April 14, 2017

Hello, every time i used to check website posts here in the early hours in the dawn,
as i love to find out more and more.

dobre karmy kocie on April 15, 2017

Its like you read my mind! You seem to grasp so much about this, such as you wrote the e book in it or something.
I think that you simply can do with some % to pressure
the message house a little bit, but other than that,
that is excellent blog. An excellent read. I'll definitely
be back.

sztukakochaniapdf.x101.pl on April 15, 2017

Currently it seems like Wordpress is the best blogging platform available right now.
(from what I've read) Is that what you're using on your blog?

I am sure this piece of writing has touched all the internet visitors, its really really good piece of writing on building
up new blog.

ciemniejszastronagreya.x101.pl on April 16, 2017

Thanks very interesting blog!

cięcie metalu laserem on April 20, 2017

Greetings! I've been reading your blog for a while now and finally got the bravery to go
ahead and give you a shout out from Porter Tx! Just wanted to mention keep up the
great work!

http://laser3d.centr.pl/ on April 24, 2017

I am extremely impressed with your writing skills and also with the layout on your weblog.

Is this a paid theme or did you modify it yourself?
Anyway keep up the excellent quality writing, it's rare to see a
nice blog like this one these days.

karmawhiskasopinie.yeta.pl on April 24, 2017

If you are going for most excellent contents like me,
just visit this web page every day as it
gives feature contents, thanks

Post a comment