Revision of Laconica Word Filter Plugin (Wordfilter) from 12/22/2009

07 Jan Tagged code, FLOSS, Laconica, microblogging, php, plugin, profanity

The TWiT Network has pretty strict rules about profanity across all channels including the netcasts, chatrooms and the TWiT Army Canteen. There are usually moderators lurking the IRC and microblog but once in a while some profanity gets through the cracks. Therefore I wrote a little word filter to preempt the profanity. I'm not sure if they'll use it but it was fun to write. Sorry about the profanity in this post but it's sort of necessary.

Installation

Add the following to config.php. Add your own word replacements.

# Wordfilter plugin.
require_once('plugins/Wordfilter.php');

# === List search / replace terms here ===

# Replacements should be less than or equal in length since size matters.
# Any search term will be replaced even if it's in the middle of a word.
# For instance "fuck" will sanitize "motherfucker" but this can cause false positives.
# For instance, 'twat' will falsely sanitize "wristwatch". Use spaces to limit matches.

# $config['wordfilter']['search'][] = 'twat'; // will be matched anywhere, even within words
# $config['wordfilter']['search'][] = ' twat'; // matches words beginning with "twat"
# $config['wordfilter']['search'][] = 'twat '; // matches words ending with "twat"
# $config['wordfilter']['search'][] = ' twat '; // only matches the word "twat"

$config['wordfilter']['search'][] = 'blatherskite'; // for testing so you don't have to swear on your site.
$config['wordfilter']['replace'][] = 'blatherin';

$config['wordfilter']['search'][] = ' twat ';
$config['wordfilter']['replace'][] = ' tool ';

$config['wordfilter']['search'][] = ' cock ';
$config['wordfilter']['replace'][] = ' hen ';

$config['wordfilter']['search'][] = 'fuck';
$config['wordfilter']['replace'][] = 'frak';

$config['wordfilter']['search'][] = 'shit';
$config['wordfilter']['replace'][] = 'poop';

$config['wordfilter']['search'][] = 'bitch';
$config['wordfilter']['replace'][] = 'dog';

/* alternate list syntax
$config['wordfilter']['search'] = array('fuck', 'shit', 'bitch');
$config['wordfilter']['replace] = array('frak', 'poop', 'dog');
*/


$wordfilter = new Wordfilter();

Plugin source code

Save the following to plugins/Wordfilter.php
<?php if (!defined('LACONICA')) exit(1);
/**
 * Wordfilter Plugin
 *
 * @category Plugin
 * @package  Laconica
 * @author   Kyle Hasegawa  @kylehase
 * @license  http://www.fsf.org/licensing/licenses/agpl-3.0.html GNU Affero General Public License version 3.0
 * @version  Wordfilter.php,v 0.3 2009/06/15 00:40:25 +0900
 *
 */


class Wordfilter extends Plugin
{
    function __construct()
    {
        parent::__construct();
    }
       
    // Hook StartNoticeSave
    function onStartNoticeSave($notice) {
        // Get search and replace arrays
        $search = common_config('wordfilter','search');
        $replace = common_config('wordfilter','replace');

        // Wrap notice in spaces. Easier and faster than regex ^ $
        $notice->rendered = ' '.$notice->rendered.' ';
        $notice->content = ' '.$notice->content.' ';

        // Case insensitive replacements
        $notice->rendered = str_ireplace($search, $replace, $notice->rendered);
        $notice->content = str_ireplace($search, $replace, $notice->content);

        // Trim extra whitespace and save.
        $notice->rendered = trim($notice->rendered);
        $notice->content = trim($notice->content);                  
    }
}

Update

Regarding longer replacements, according to thefrogman

longer words show up in their entirety on army [web interface] but get cutoff in twhirl [clients]. Didn't seem to break anything
So it's not a major problem if the replacement string is longer than the original.

Update v0.3

Version 0.3 changes things around a bit to prevent false positives.

All code on this site is free for use at your own risk and provided as-is under the WTFPL license unless otherwise stated. Attribution is appreciated but not required.
Blog content, with the exception of externally quoted material, is licensed under the Creative Commons Attribution 3.0 license