check html templates for spelling errors
my spelling is horribly. it’s quite annoying to update a webpage and getting complaints about some spelling errors. usually all the human language strings are nicely stored in a separate textfile but for smaller projects i often put them directly into html templates. if you have a proper ide it will do the spell checking for you. but if you use some not so sophisticated ide you have to do it yourself or you can use one of the opensource spellers. there are ispell, aspell, myspell and some more. at first i tried to write a shellscript using one of them. then i remembered using one once from python. there is this nice library, pyEnchant. it makes spellchecking really easy. here is a little python script which checks all the html templates. the advantage of this is that it can be implemented as a unittest or something similar, so you will be warned if there is a new spelling error in your project.
the script is quick and dirty and for german. but you should get the idea.
the script is quick and dirty and for german. but you should get the idea.
from enchant.checker import SpellChecker
import os, re, codecs, sys
chkr = SpellChecker("de_DE")
# patterns remove in this case html and jinja2/ django
# code and some special words
rmPatterns = [r'<.*?>', r'{%.*?%}', r'{{.*?}}', r'me@norep\.com',
u'projectName', u'FooName']
# get a list of directories and subdirectories
def listdirs(dirname):
dirs = [os.path.join(dirname, f) for f in os.listdir(dirname) /
if os.path.isdir(os.path.join(dirname, f))]
for d in dirs[:]:
dirs += listdirs(d)
return dirs
for d in listdirs(’templates’):
for f in [os.path.join(d, f) for f in os.listdir(d) if re.search(r'\.html$', f)]:
# read filedata… as unicode, i always use unicode
fd = codecs.open(f, ‘r’, ‘utf-8′)
data = fd.read()
fd.close()
# remove the tags and codes defined in rmPatterns
for p in rmPatterns:
data = re.sub(p, ”, data)
# get error words
found = []
chkr.set_text(data)
for err in chkr:
found.append(err.word)
# if errors found, print them
if len(found) > 0:
print “%s: %s” % (len(found), f)
for w in found:
print ” : %s -> %s” % (w, ‘, ‘.join(chkr.dict.suggest(w)))
the output for one of my projects is:
1: templates/pages/about.html
: stösst -> stößt, störst
2: templates/pages/help.html
: Registrier -> Registrier-, Registriere, Registriert, Registrieren
: Bestätigungs -> Bestätigung, Bestätigungs-
3: templates/pages/legalNotice.html
: St -> Set, St., Et, Kt, Sh, SV, Ist, Äst, Ast, Ost, Gst, Sät, So
: mail -> Mail, mal, -mail, mail-, mai-
: tel -> teil, Gel, Tel., Telex, Teller, Telekom