Static-Site Comments via Email

Comments for statically generated blogs are often done through 3rd-party providers, e.g. Disqus, et al. I was uncomfortable with that approach. I wanted to keep things static with minimal JavaScript and minimal back-end code. But there are a number of features and concerns that still need to be addressed, namely: spam, privacy, and security. I didn’t want to reinvent the wheel. I just wanted to allow comments in a minimalistic way without ending up down a rabbit hole.

Failed Idea: Static Files

My first idea was to create static files on the web server when a comment was submitted, then have a script on my Jekyll staging server remote copy using scp and add it to the appropriate post. But I would still have to integrate some spam filtering and I didn’t like idea the managing the comment files manually. My next idea developed organically from these shortcomings.

The Email Approach

Spam filtering was one of the first items on my list of concerns. My epiphany was: why not just use my email spam filtering? Sending comments via email would let me do this without implementing a separate spam filtering system. And instead of managing comment files manually, like my previous approach, I could browse the comments using an email client. This sounded like a better approach–so I ran with it.

HTML form to static comment

With a rough idea in my head I was left with a number of questions: How will I implement this? What tools do I have to work with? What will I need to create?

My current tool-chain:

Additional tools to include:

Programming TODO list:

  • HTML submit comment form
  • HTML comments template
  • PHP script to process comment form and send email
  • Python script to fetch email and append to appropriate blog post

We’ll end up with mix of languages in the end: Ruby, PHP, and Python. Since Jekyll is written in Ruby, it may have made sense to build everything using that. But Ruby is one language I have never used before. So I went with a combination of what I had on hand and what I knew.

Comment form and JavaScript validation

The comment form is a standard html form. It gets submitted as an HTTP POST (no fancy AJAX-style submits). If you look at the source of this page, you’ll find a single form with some basic HTML with Bootstrap template css classes. The name field is required, but a pseudonym is fine. The email address is required, but it is not published and I only use it if I want to reply via email directly to the commenter. The website field is optional, it will be used to hyperlink the name in the comment. The comment textarea allows a few html tags but cleans the rest if it gets submitted. The form looks as follows (you can try it out at the bottom of the page too!):

Prior to allowing comments this site had no JavaScript. But it makes sense to do some JavaScript based validation in the browser before submitting the form. The server will also do its own validation and sanitizing of the input.

<form name="comment-form" role="form" method="POST" action="/comment.php" onsubmit="javascript:return validate_comment_form()">

The browser validates that the required fields: name, email, and comment, are non-empty. The validation code, in its entirety, is as follows:

 function form_group_class(element, class_string) {
	document.forms['comment-form'][element].parentElement.setAttribute('class',class_string);
 }
 function validate_comment_form() {
	 var comment_form = document.forms['comment-form'];
	 form_group_class('name', 'form-group');
	 form_group_class('email', 'form-group');
	 form_group_class('comment', 'form-group');
	 var retval=true;
	 if(comment_form['name'].value=="") {
		 form_group_class('name', 'form-group has-error');
		 retval=false;
	 }
	 if(comment_form['email'].value=="") {
		 form_group_class('email', 'form-group has-error');
		 retval=false;
	 }
	 if(comment_form['comment'].value=="") {
		 form_group_class('comment', 'form-group has-error');
		 retval=false;
	 }
	 
	 return retval;
 }

The first function, form_group_class(element, class_string), is a helper function to set the class attribute for each form element’s div. This allows setting the Bootstrap has-error class to indicate a form field is required.

There is one hidden field that is not validated here, called page. It contains the Jekyll page path and becomes the subject line for the email. This gets used to determine the post that the comment gets applied to. I don’t allow arbitrary paths here, so if there is an attempt to design an attack using this information, it gets filtered out later on. This hidden page field is the Jekyll Template variable:

{{ page.path }}

E.g. the hidden page value for the current page is:

<input type="hidden" name="page" value="_posts/2017-03-24-static-site-comments-via-email.markdown">

Important note: the _posts portion is later dropped and only the base name of the file gets used.

At this point we have a relatively straight-forward HTML form and 23 lines of JavaScript to submit for server-side processing–not a bad start.

Sending mail from the web server

First thing is to configure our MTA. I went with ssmtp (rather than Postfix, Sendmail, or EXIM). The configuration of ssmtp is simpler in my opinion. On Ubuntu 16.04 I install as follows:

sudo apt install ssmtp

Then I edit the settings in /etc/ssmtp/ssmtp.conf to be:

root=comments@stevescott.ca
mailhub=mailserv.stevescott.ca:465
rewriteDomain=stevescott.ca
hostname=myhostname
FromLineOverride=YES
UseTLS=Yes
AuthUser=comments@stevescott.ca
AuthPass=#password

And /etc/ssmtp/revaliases is set to:

root:comments@stevescott.ca:mailserv.stevescott.ca:465
mainuser:comments@stevescott.ca:mailserv.stevescott.ca:465

My web server already had PHP for a previous project, so rather than installing some other language/framework, I just went with what I had. To use the PHP mail() function, we still need to tell it to use ssmtp. The php.ini is edited as follows (in my case it’s in /etc/php/7.0/fpm/php.ini):

[mail function]
; For Win32 only.                                                               
; http://php.net/smtp                                                           
;SMTP = localhost                                                               
; http://php.net/smtp-port                                                      
;smtp_port = 25                                                                 

; For Win32 only.                                                               
; http://php.net/sendmail-from                                                  
;sendmail_from = me@example.com                                                 

; For Unix only.  You may supply arguments as well (default: "sendmail -t -i").
; http://php.net/sendmail-path                                                  
sendmail_path = /usr/sbin/ssmtp -t

So everything between [mail function] and sendmail_path is commented out.

My setup uses a dedicated email for comments. The ‘comments@’ email is convenient since I run the email server and can easily add the address. Another option is to use an existing email and redirect the comments to their own particular folder.

At this point we’re ready for PHP to process the comment form and send the email. We need to sanitize the inputs first. This is my initial sanitizing code in PHP:

<?php

$name = filter_input(INPUT_POST, 'name', FILTER_SANITIZE_STRING);
$email = filter_input(INPUT_POST, 'email', FILTER_SANITIZE_EMAIL);
$web = filter_input(INPUT_POST, 'web', FILTER_SANITIZE_URL);
if(strlen($web)>0 && !preg_match('/^http/', $web))
	$web = 'http://'.$web;
$comment = filter_input(INPUT_POST, 'comment');
if($comment!=strip_tags($comment))
	$content_type="text/html";
else
	$content_type="text/plain";
$page = filter_input(INPUT_POST, 'page', FILTER_SANITIZE_FULL_SPECIAL_CHARS);

?>

The name field gets full sanitizing because we’re going to use it with the email in the Reply-To header. So it will be sent as Reply-To: name <email@domain.com>.

We sanitize the web parameter as a url using PHP’s filter. If it doesn’t contain an http prefix, we add it.

It may seem strange to leave comment unfiltered. Why not strip everything but the allowable tags? The reason to leave it with warts and all is because I want Spam Assassin to see it for all of its potential spam. If the comment checks out, it will get cleaned when fetched (explained later), before inserting it into the Jekyll post. I do check the comment for tags but only to decide on the content type.

The page parameter is set by Jekyll. This check is to remove any potentially malicious garbage since it’s the email subject line. It’s also later used to identify a post by its file name–so the checks and santizing are very important.

After the inputs are sanitized I validate them. For name, web, and comment fields, I only care about emptiness and whether or not it is too long. For the validity of the email address format, I rely on PHP (for better or worse) for this one. I include the added check of making sure the domain has a valid MX record. This should help weed out some potential spam.

<?php

$error = '';
//name length
if(empty($name)) {
	$error .= "You forgot to to enter your name.<br />\n";
} else if(strlen($name)>256) {
	$error .= "The name you entered is too long.<br />\n";
}

//email length
if(empty($email)) {
	$error .= "You must enter a valid email address.<br />\n";
} else if(strlen($email)>256) {
	$error .= "The email you entered is too long.<br />\n";
} else if(!filter_var($email, FILTER_VALIDATE_EMAIL)) {
	$error .= "The name and/or email address you entered is invalid.<br />\n";
} else { //valid address, now check domain MX record
	$domain = explode("@", $email)[1];
	if(!checkdnsrr($domain,"MX")) {
		$error .= "The email address you entered seems to be invalid.<br />\n";
	}
}

//web length
if(strlen($web)>256) {
	$error .= "The website address you entered is too long.<br />\n";
}

//comment length
if(empty($comment)) {
	$error .= "You forgot to enter your comment.<br />\n";
} else if(strlen($comment)>4096) {
	$error .= "Your comment is too long.<br />\n";
}
%>

If $error is non-empty at this point, we print out the offences, and direct the user back to fix them. I won’t paste all of the gory if-then details (for that, have a look at the source on GitHub). But if all is OK, we build the email and send it. The important part of the email construction is as follows:

<?php

$headers = "Reply-To: ".$name." <".$email.">"."\r\n".
 "X-Website: ".$web."\r\n".
 "From: comments@stevescott.ca"."\r\n".
 "MIME-Version: 1.0"."\r\n".
 "Content-type:".$content_type.";charset=UTF-8"."\r\n".
 "X-Forwarded-For: ".$_SERVER['REMOTE_ADDR']."\r\n";
	
$ret = mail('comments@stevescott.ca', $page, $comment, $headers);
?>

The commenter is identified by the Reply-To header.

The To and From headers are the same. This is intentional.

The web url of the commenter (which can be empty) gets set to an application specific X-Website header that will be extracted later.

The X-Forwarded-For header contains the submitter’s IP Address. This is for moderation purposes only, so we can block abusive IP addresses.

If the mail() function is successful, PHP has done its job. We can move on to getting the mail into the comment section of the page.

Download email and insert comment

With Jekyll being written in Ruby, it might have been logical to write an extension or script to download the email and insert the comment. But I chose Python. Not for any particular reason other than it’s what I’ve been working in lately.

The Python source for _fetchimapcomments.py is on GitHub. But I will highlight some important code here.

First, since the script is connecting to an IMAP server it needs credentials. These are stored in the Jekyll project folder along with the Python script. A file called .fetchimapcomments has these settings:

[imap]
host = mail.yourdomain.com
port = 993
username = comments@yourdomain.com
password = password-goes-here
folder = INBOX

[jekyll]
post = _posts

This configuration file will be read using the Python ConfigParser (included with import configparser):

config = configparser.ConfigParser()
config.read('.fetchimapcomments')

The IMAP credentials are passed as arguments to a function called fetch_imap_messages.

def fetch_imap_messages(host, port, username, password, remote_folder):
    comments = []
    with IMAP4_SSL(host, port) as imap:
        try:
            imap.login(username, password)
        except imaplib.IMAP4.error:
            print("Login failed for: %r. Check credentials." % username)
            return []
        
        count = imap.select(remote_folder)
        if count[0]=='NO':
            print("Invalid mailbox: %r" % remote_folder)
            return []

        typ, mailnums = imap.search(None, 'ALL')
        for i in mailnums[0].split():
            typ, message_data = imap.fetch(i, '(RFC822)')
            message = Parser(policy=policy.default).parsestr(message_data[0][1].decode('utf-8'))
            
            comment_data = extract_email_comment(message)
            if comment_data==None or len(comment_data.keys())==0:
                continue
            comments.append(comment_data)

        imap.close()
    
    return comments

TL;DR fetch_imap_messages method

  • connect to the IMAP server using SSL
  • login
  • select the folder
  • grab ALL messages from the folder
  • loop over the messages
  • extract the relevant message headers and body: extract_email_comment(message)

The message returned by the Parser is a Python email.message object.

The extraction of the relevant comment data happens in a function extract_email_comment which takes a Python email.message as its argument.

def extract_email_comment(message):
    #if message is multipart: assume spam, ignore, and continue.
    if message.is_multipart() == True:
        #print("Unsupported multipart message.")
        return None    

    #bad content type: assume spam, ignore, and continue
    if message.get_content_type() not in ['text/plain','text/html']:
        #print("Unsupported content-type: %r. Must be 'text/plain' or 'text/html'")
        return None
    
    comment = message.get_payload().strip()    
    if message.get_content_type()=='text/html':
        comment = clean_html(comment)

    #important parts needed for comment
    message_id = message['Message-Id'] #used to uniquely identify comment
    website = message['X-Website']
    date = parse(message['Date'])
    
    reply_to = message['Reply-To']
    #if the '<' is not in the string, then the name is missing
    if reply_to.rfind('<')==-1:
        name = "anonymous"
        #email = reply_to
    else:
        name = reply_to[0:reply_to.rfind('<')-1] #extract name from address
        #don't really need the email, but if I did, this is how I would extract it
        #email = reply_to[reply_to.rfind('<')+1:reply_to.rfind('>')]

    #use basename to prevent any ../../ shenanigans
    post_file = os.path.basename(message['subject'])

    #comment data for yaml front matter
    comment_data = {
        'message_id': literal_str(message_id),
        'author': name,
        'author_url': literal_str(website),
        'content': folded_str(comment),
        'date': date,
        'post': post_file
    }

    return comment_data

A few important points to highlight about the code:

  • Ignore junk message types: multipart and non-text/plain/html
  • Use the lxml library’s clean_html function if the message is in html (this is very important to avoid XSS or malicious posts)
  • The Message-Id email header is used as a unique identifier so we don’t add posts more than once
  • Extract the base file name from the email’s subject for safety against weird path attacks
  • The literal_str and folded_str are yaml representers I defined to format the strings appropriately (defined just below the import section of _fetchimapcomments.py)

So the fetch_imap_messages returns a Python list of comment data returned by extract_email_comment. The "__main__" loop in _fetchimapcomments.py is:

    post_folder = config['jekyll']['post']
    comments = fetch_imap_messages(config['imap']['host'], config['imap']['port'], config['imap']['username'], config['imap']['password'], config['imap']['folder'])
    for c in comments:
        insert_post_comment(c,post_folder)

So now the insert_post_comment function remains to be described. This function reads the Jekyll post file, extracts the yaml frontmatter separately from the body of the post, and if the comment doesn’t already exist, it adds it to the comments section of the yaml frontmatter.

def insert_post_comment(comment, post_folder):
    #pop post file from comment data (since we don't need to store it)
    post_file = comment.pop('post', None)
    post_file = post_folder + '/' + post_file
    
    #check for existence of the post file
    if os.path.isfile(post_file)==False:
        print("%r is missing or invalid." % post_file)
        return False
    
    with open(post_file) as pf:
        post_data = pf.read()
        pf.close() #close now, we'll need to reopen for writing below
        
        #extract post front matter
        sep_len = len('---')
        post_fm = post_data[post_data.find('---')+sep_len:post_data.find('---', sep_len)]
        #extract text section from post (everything after the front matter)
        post_text = post_data[post_data.find('---', sep_len)+sep_len+1:]

        #todo: should probably check for failure here
        post_yaml = yaml.safe_load(post_fm)

        if 'comments' not in post_yaml:
            post_yaml['comments'] = []
        
        #loop over comments, if ID already exists, skip it.
        def does_comment_exist(id, post_comments):
            for c in post_comments:
                if c['message_id']==id:
                    return True
            return False

        if does_comment_exist(comment['message_id'], post_yaml['comments']):
            print("Skipping comment: %r for %r (already exists)" % (comment['message_id'], post_file))
            return False
        print("Adding comment %r for %r" % (comment['message_id'], post_file))
        
        post_yaml['comments'].append(comment)
        
        #todo: catch/test for failure here
        yaml_fm = yaml.dump(post_yaml)
        
        #update post with new comment
        #overwrites with new front matter and existing post_text
        with open(post_file, 'w') as pf:
            print('---',file=pf)
            print(yaml_fm,file=pf)
            print('---',file=pf)
            print(post_text,file=pf)       
    
    return True

The yaml library, for yaml.safe_load and yaml.dump, is from PyYAML.

In all, _fetchimapcomments.py (as of this writing) contains about 172 lines of Python, including comments and whitespace. At this point, we have everything we need to be up-and-running with comments on our Jekyll-based website.

Build and publish

When an email comment has been received, I do a visual inspection in Mozilla Thunderbird (which I have been a dedicated user of for almost 13 years). So I see a folder view like this:

The email message will look something like:

So my comment-moderator admin tool is an email client. Any email client will do. I just delete messages (spam) that I don’t want to post from the Inbox. If I’m on the road, I can do this from Mail.app on the iPhone and use Panic Inc’s Prompt to carry out the terminal window tasks.

If the comments are OK, then I switch to a terminal/command-line window connected to my staging server, and run _fetchimapcomments.py. It should look something like this:

$ ./_fetchimapcomments.py
Adding comment '<20170331140250.0013E41530@mailserv.stevescott.ca>' for '_posts/2017-03-24-static-site-comments-via-email.markdown'

Since jekyll build --watch is usually running, it picks up the changes immediately. Then I inspect the updated comments from my browser. I expect to see something like:

If the comments have been added successfully I push the changes live through rsync. I have a script in my home/bin folder called ~/bin/publish_stevescott_ca.sh. This is a one-liner as follows:

rsync -avz -e 'ssh -p 22' /var/www/stevescott_ca/_site/ mainuser@stevescott.ca:/var/www/stevescott/

If running ssh on a non-standard port, i.e. something other than 22, just modify it accordingly.

Try it out

If you wish to try it for your own Jekyll site, fork/grab the files I’ve posted on GitHub and give it a whirl. Let me know how it goes. The contact form on the about page piggybacks off of the same approach (without commenting publicly).

Feel free to contact me about this post. It uses the method described in this post.

Happy commenting!

P.S. The Downsides (and one more alternative)

My email approach precludes one of the important use-cases for statically generated blogs: hosting on GitHub Pages. I considered submitting the form via ‘mailto:’ method, which removes the server-side PHP dependency, but it introduces other problems. ‘Mailto’ links require an email client but many people just use webmail. It also requires the person use the email account they have on their device/computer–likely with their associated real name and primary email address. What if they want to comment under a pseudonym or different email address? This approach might be a way to cut down on spam, but it also might discourage people from commenting.


2 comments on 'Static-Site Comments via Email'

  • Christopher Mackay

    Great idea, Steve — thanks for sharing it. This nudges me one step closer to trying out Jekyll.

  • Steve Scott

    Thanks for the comment Chris! I'm very happy with Jekyll. Definitely worth testing it out.