Archive for November, 2009

check html templates for spelling errors

Tuesday, November 17th, 2009
my spelling is horribly. it’s quite annoying to update a webpage and getting complaints about some spelling errors. usually all the human language strings are nicely stored in a separate textfile but for smaller projects i often put them directly into html templates. if you have a proper ide it will do the spell checking for you. but if you use some not so sophisticated ide you have to do it yourself or you can use one of the opensource spellers. there are ispell, aspell, myspell and some more. at first i tried to write a shellscript using one of them. then i remembered using one once from python. there is this nice library, pyEnchant. it makes spellchecking really easy. here is a little python script which checks all the html templates. the advantage of this is that it can be implemented as a unittest or something similar, so you will be warned if there is a new spelling error in your project.

the script is quick and dirty and for german. but you should get the idea. from enchant.checker import SpellChecker import os, re, codecs, sys chkr = SpellChecker("de_DE") # patterns remove in this case html and jinja2/ django # code and some special words rmPatterns = [r'<.*?>', r'{%.*?%}', r'{{.*?}}', r'me@norep\.com', u'projectName', u'FooName'] # get a list of directories and subdirectories def listdirs(dirname): dirs = [os.path.join(dirname, f) for f in os.listdir(dirname) / if os.path.isdir(os.path.join(dirname, f))] for d in dirs[:]: dirs += listdirs(d) return dirs for d in listdirs(’templates’): for f in [os.path.join(d, f) for f in os.listdir(d) if re.search(r'\.html$', f)]: # read filedata… as unicode, i always use unicode fd = codecs.open(f, ‘r’, ‘utf-8′) data = fd.read() fd.close() # remove the tags and codes defined in rmPatterns for p in rmPatterns: data = re.sub(p, ”, data) # get error words found = [] chkr.set_text(data) for err in chkr: found.append(err.word) # if errors found, print them if len(found) > 0: print “%s: %s” % (len(found), f) for w in found: print ” : %s -> %s” % (w, ‘, ‘.join(chkr.dict.suggest(w))) the output for one of my projects is: 1: templates/pages/about.html : stösst -> stößt, störst 2: templates/pages/help.html : Registrier -> Registrier-, Registriere, Registriert, Registrieren : Bestätigungs -> Bestätigung, Bestätigungs- 3: templates/pages/legalNotice.html : St -> Set, St., Et, Kt, Sh, SV, Ist, Äst, Ast, Ost, Gst, Sät, So : mail -> Mail, mal, -mail, mail-, mai- : tel -> teil, Gel, Tel., Telex, Teller, Telekom