python, i like the snake

90% of my development work during the last few years was done in java. i had to create a little project for the OLPC about a year ago. OLPC is using python very heavily and so i developed the tool in python. before that, i did only a few very simple things in python. a few weeks ago i started to develop webapplications in python. here are a few ideas and a list of packages/ software i used.

python versus java

compared with java, python is IMHO not as clean. it isn’t as “designed” as java is. but compared with, for example PHP, it’s a hyper clean and intuitive language. java is static typed and although also bytecode compiled like python, it needs an explicit compilation step. this makes developing python programs a bit more agile. for java and python there are lots of libraries/ packages available. as much as i have seen the python ones seems to be usually higher quality, but this can just be the selection i made and doesn’t need to be a general thing. the most important point in using python for webapps instead of java is that the python programs are usually fewer lines of code compared to the java programs. as much as i have seen till now i think this is also possible while having at least the same readability of the code like java programs. on the other hand you can write more easyli horrible code in python than in java. so python needs a bit more discipline. i will use python mainly for little not so critical projects, at least until i am fluent with it.

what did i use for webapps

with java i usually used an apache with mod_jk to connect to tomcat. i looked for a similar setup in python and found that there are a lot of possible ways but the most promising thing was the WSGI standard. there was a module for apache2 and so i was happy. WSGI is short of WebServer Gateway Interface and it sits somewhere between the webserver and python. compared with java where a application server is used who shares a common context for all servlets/ requests, WSGI-python scripts are loaded in a context for each process (apache processes i think), so caching and communication with other requests is not possible. but, if you are used to develop your applications with a cluster in mind it doesn’t hurt. you can use memcached for a cache.

because WSGI is a very low level interface it’s best to use a proper webframework. there are lots of them. the most are full blown frameworks and i am a bit allergic to them, i like it when i can put together the tools i want and doesn’t have to be pushed to use a certain way (forced, you are never. with full blown frameworks it’s the same as with certain girlfriends. they tend to give subtle hints about something but don’t force you, they are open to everyting but you get afraid what awaits you if you really go with something else then her preferred way). the only one i kind of liked was web.py. i’ts not the most beautiful but well built and it lets you do whatever you want if you like to. i use it mainly for mapping urls to code and webstuff like redirecting, send errors and stuff.

to access a database there are different packages available. i checked for orm mappers and found quite a lot of different implementations. the two best (as much or less as i can judge it) are Storm and SQLAlchemy. both are flexible, easy to use and have lots of functionality who doesn’t get in your way if not used. compared with hibernate it’s a dream.

to render html or whatever, i was looking for a good template language. there are many of them and i decided to use Jinja2. its the one i liked most but, hey it’s a template language, they are almost all usefull, properly built and support about the same featureset.

if i remember properly, thats all i needed, at least for the web aspect of the applications. python also has many very useful modules included, therefore you don’t need to include lots of modules. no crappy apache commons jars who give some functionality they missed to implement in the standard library.

URLConnection and https

with a java.net.URLConnection i can connect to any http server. it’s also possible to connect to an https server. if i connect to a https server with a browser i might get a message that the certificate is not trusted. i am prompted to examine the certificate and mark it as a trusted certificate. after that i can connect without any problems. the same must be done if i try to connect with an URLConnection. if we try to connect to an https server via URLConnection and the certificate is not trusted a javax.net.ssl.SSLHandshakeException is thrown with the message “PKIX path building failed”… at least for the sun jvm version 1.5.

add certificate to a KeyStore

first we need to download the certificate from the webserver. this can be done with firefox. if you accepted the servers certificate you can save the certificate by selecting: Edit->Preferences->Advanced->Encryption->View Certificates->Your Certificates here you need to select the certificate and then click on export. save it somewhere on your harddisk. with this certificate java cannot work directly… actually it can but it’s easier to transform it into a KeyStore file. with the command keytool -import -alias aliasOfCertifiate -file certificateFile.cer\ -keystore myKeystore the keytool program is distributed with a jdk. with the command we add the certificate certificateFile.cer as a trusted certificate to the keystore file named myKeystore. the tool prompts for a password. this password is used to encrypt the keystore file.
instead of adding the certificate to myKeystore we could also add it to the default keystore of the jvm. this is done with: keytool -import -alias aliasOfCertifiate -file certificateFile.cer\ -keystore $JAVA_HOME/lib/security/cacerts with the password “changeit”. this uses root privileges and it is the default setting of all java programs. it’s a bit like pollution of the “global” environment and it’s better to avoid this.

use that keystore

if i have an URLConnection with https as a protocol it’s an instance of HttpsURLConnection and i can simply cast to it. HttpsURLConnection has a method setSSLSocketFactory. this socketFactary can be configured to accept certain certificates or not. a socketFactory which accepts certificates in myKeystore can be created with the following code: InputStream in = new FileInputStream(new File("path/to/myKeystore")); KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType()); ks.load(in, "PasswordUsedWithKeytool".toCharArray()); in.close(); TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm()); tmf.init(ks); X509TrustManager defaultTrustManager = (X509TrustManager)tmf.getTrustManagers()[0]; SSLContext context = SSLContext.getInstance(”TLS”); context.init(null, new TrustManager[] {defaultTrustManager}, null); SSLSocketFactory sslSocketFactory = context.getSocketFactory(); here the keystore is loaded at first. you have to provide the password you typed in during creation of the keystore file. after that a TrustManager is created via a TrustManagerFactory initialised with our KeyStore. then the SSLContext is created and initialised with the trustManager. after that a SSLSocketFactory can be created by the getSocketFactory method of the SSLContext. we can use it for our URLConnection like following: URL url = new URL("https://thesecuredomain.org"); URLConnection con = url.openConnection(); ((HttpsURLConnection) con).setSSLSocketFactory(sslSocketFactory); con.connect(); in = con.getInputStream(); ...

jj1 webservice step by step

we will create a little json-rpc webservice in java using jj1. it’s a step by step tutorial using a fresh install of tomcat and java. if you use an ide it should be easy to adabt it. the webservice will generate an ascii banner from a string. the code to generate the ascii banner exists already and just needs to be made accessible as a webservice.

what you need

you need a default tomcat installation and a jvm (java virtula machine). to test if a jvm is available just type java at the command prompt. if something like java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode) apears you have one installed. otherwise download one from sun or use your package manager to install one. the easiest way to install tomcat is with your package manager if you use a linux like os. if there is no tomcat support take a look at the tomcat setup page.

directories

i usually create a main folder named after the project. in this folder there is a src folder for the java source, a bin folder for the compiled files and a dist folders for jars and wars. if it’s a webapp there is also a web folder which is the webapp. so for our little example app it looks like: asciiText asciiText/src asciiText/bin asciiText/dist asciiText/web asciiText/web/WEB-INF asciiText/web/WEB-INF/lib

dependencies

we need three jars. at first the jj1 jar. the next jar is the stringtree jar. this is used to encode/ decode json. as next we need the service implementation. save all the jars in the asciiText/web/WEB-INF/lib folder.

create the context object

to access a method via json-rpc we have to expose this method to the web. in jj1 this is done via a context object. this is our first… and only java class. we will create it in a ch/kerbtier/asciitext subfolder of the src folder. the ch/kerbtier/asciiws folders represent the namespace or package where the class lives in. the file should be named AsciiContext and look like: package ch.kerbtier.asciiws; import ch.kerbtier.asciitext.AsciiRenderer; import com.googlecode.jj1.server.JsonRpc; public class AsciiContext{ private AsciiRenderer renderer = new AsciiRenderer(); @JsonRpc public String getText(String input, String font, int size){ return renderer.createAscii(input, font, size); } } thats it. we publish the method getText (with the JsonRpc annotation) as a webservice. internally it just calls the createAscii method of the renderer.

web.xml

a webapplication is configured by a web.xml file which lives in the WEB-INF folder of a webapp. we need the following file: <?xml version="1.0" encoding="UTF-8"?> <web-app> <servlet> <servlet-name>AsciiText</servlet-name> <servlet-class>com.googlecode.jj1.server.Jj1Servlet</servlet-class> <init-param> <param-name>services</param-name> <param-value>root=ch.kerbtier.asciiws.AsciiContext</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>AsciiText</servlet-name> <url-pattern>/ascii</url-pattern> </servlet-mapping> </web-app> this instantiates a Jj1Servlet and loads an AsciiContext instance as a jj1 context and publishes its methods directly under the url ascii.

build the whole stuff

go to your asciiText directory and type: javac -d bin -classpath web/WEB-INF/lib/asciiGenerator.jar:\
web/WEB-INF/lib/jj1.0.1.jar src/ch/kerbtier/asciiws/AsciiContext.java
this compiles the AsciiContext file and places it into your bin directory into the proper package. with the -classpath option you specify the jar files with the classes inside AsciiContext depends on. on windows you need a ; as delimiter between classpath entries. jar cf dist/asciiws.jar -C bin ch this creates a jar file out of the class… it’s useful if you have lots of classes, with one it’s just habit. cp dist/asciiws.jar web/WEB-INF/lib/ copies the generated jar file into the lib folder of the webapp. jar cf dist/asciiws.war -C web WEB-INF this creates the file asciiws.war. now we just need to deploy it with tomcat. one easy way is to just copy it into the webapps folder. after that the webservice should be accessible trough the url http://localhost:8080/asciiws/ascii

levenshtein to slow, how to speed it up

for a little project i need to compare a string against a large set of strings. it should not only match the exact strings, it also should match strings which are similar. to find out if two strings are similar there exists an algorithm called “levenshtein”. it takes two strings as an argument and returns the distance between these strings. if the distance is zero the strings are equal. the bigger the distance the more the strings differ.

to slow

i use it to compare strings which are about 200 chars long and there are at the moment 40′000 strings. to compare one string to the existing set i need to call the levenshtein algorithm 40′000 times. because the algorithm itself is not super fast it takes a long time to do the comparison. i took an implementation from Levenshtein: Java Implementation to test and i might get this implementation faster, if i write an optimized version, but i doubt that i get out much. the problem is the 40′000 calls to it.

you don’t need 40′000 levenshtein calls

one attempt to make it faster is to rule out some strings by comparing lengths first. if you say your maximum distance for two strings to match is 10 then you can discard strings which difference in character count is bigger than 10. it’s already much faster, for the 40′000 and for my usage almost fast enough but the faster the better. in this case i still have to check one strings length against 40′000 other strings length. this can be speed up by putting the sets of the strings with the same length into an array where the index is the length of the sets strings. so i don’t have to compare it against all strings. if l is the length of the string i want to test then i need to test it only against the strings in the sets for the indexes with a length from l-10 to l+10. probably i can rule out with this technique even more strings, by example count the number of words in the strings instead of the number of characters or the number of vowels. these approaches could be combined together and it probably will speed it up another bit. but due to statistics the result would probably be about the same like in the case where i use the total length of the string.

how much faster is it?

i measured the time by feeling and don’t know how fast it is exactly. with the string length approach i needed to test only about 85% of the strings against each other. this is faster but still 6′000 hits. if the set of strings to test against gets bigger i’ll soon have the same problem. if i include the other two approaches or similar one i might get it maybe to 90% or even 95% but this still wont help if the set is getting ten or a hundred times bigger.

a better approach?

a better approach would probably be to do the test against all strings in one go. for this the levenshtein algorithm has to be rewritten and i don’t know if it’s even possible. because i’m not a mathematician i might have problem with this approach but i’ll try it and post it here if it works.

install ubuntu from usb stick

i bought me a “Gigabyte GA-GC230D, Atom 230″ motherboard some weeks ago and now i finally had time to install linux on it. i was a bit surprised how much of a torture this was. there are lots of quite specific howtos but still it took me hours of trying. in the end it was quite easy, but you have to know how. here is a list of sources i used but none of them did it by itself. live usb pendrive persistent
installation from usb stick
how to install ubuntu on usb bar

preparing the usb stick

at first, i created a single partition, there is usually already one on an usb stick. it has to be at least the size of two cds. then i formated it like: sudo mkfs -t vfat /dev/sdx1 whereas /dev/sdx1 is the partition of the usb stick. be careful not to format accidentally another partition if yo have serial-ata or scsi disks. i accidentally formated my swap space :-) you can find out your usb device by typing: sudo fdisk -l

copy the files

you need to get an iso image of an install cd. i got the ubuntu 8.10 server image. after downloading i created a directory, mounted the iso to this directory and copied all the files to the usb stick. it is probably not necessary to copy all files to the stick but i was to lazy to test whats exactly necessary. the path of the usb stick was in my case /media/disk. mkdir ubuntuImage mount -o loop /path/to/iso-image ubuntuImage cd ubuntuImage cp -Rf * /media/disk cp -Rf .disk /media/disk cp -Rf isolinux /media/disk/syslinux cd /media/disk/syslinux mv isolinux.cfg syslinux.cfg thats it, the files are on the stick. during the installation there was a problem copying files from the stick. i solved it by making a copy of /media/disk/dists/intrepid to stable. on the cd there was a symbolic link to stable, this is not possible on a fat filesystem. cp -R /media/disk/dists/intrepid /media/disk/dists/stable to “fix” another problem occurring later, copy the whole iso image to the stick too.

make the drive bootable

to install the bootloader you need a command called syslinux. it does some magic to the usb stick. to install it type: sudo apt-get install syslinux mtools if your usb stick is mounted, unmount it. use sudo syslinux /dev/sdx1 to finally install the bootloader. to be sure your stick has a proper master boot record use: install-mbr /dev/sdx

booting from the stick

in the bios i had to activate an option called “legacy USB storage detect” and select USB-ZIP as boot device. after that ubuntu booted and the installer started. the first problem occurred when it tried to load the cd. it just wasn’t able to do this. with alt-f2 you can switch to the console and mount the “cdrom” manually by typing mount -t vfat /dev/sdx1 /cdrom go back to the installer with alt-f1, try the failed step again and it should now work. after setting up network and disk there will occur another error. when trying to install the base system a message “Failed to determine codename for the release” will appear. go back to the install menu and select “load installer components from cd”. select the iso option and it will find the image and the installation should continue without problems.

formatter for debugging json, xml and more

very often when i am developing applications i have to work with external services. sometimes the communication is in xml, sometimes it is in json and sometimes it is a very curious nonstandard format. xml and json both have the advantage of being human readable. but very often this advantage is destroyed by using horribly formatted xml and json streams. this makes it much harder to debug. often i end up with a large chunk of xml or json with no linebreaks in it. instead of installing an appplication who can properly format that chunk you can also use online tools. it is very helpful if you can copy/paste the data and don’t have to create a new file, paste it and open it. here is a list of a few:

json formatting

  • JSON Formatter very powerfull, great userinterface, also validates input, customizable
  • JSON Format it gets a bit slow if you use really large json strings

xml formatter

havent found a good one yet.

urlencoder/decoder

if you provide your data as an url parameter it might be helpfull if you can easyli decode it.

html formatter

usually not used for communication but can be handy if you need to extract content of a website.

SQL Formatter

found it, so it is here.

mysql auto_increment with multi valued key

a few weeks ago i had to create a mysql database where a table had a primary key which consisted of a foreign key and an index. i did a first try with a table like this: CREATE TABLE testTable ( ref bigint not null, idx bigint not null auto_increment, PRIMARY KEY (ref, idx) ) ENGINE=MyISAM; ref was a reference to another table so that was given and idx was to differentiate between versions. one requirement was that idx should start for each different ref with 1 and count up. but i forgot about that requirement and did it like above. i remembered the forgotten requirement a few days later when the database was already in use. i was quite surprised when i saw that all worked properly, like it should. i expected that idx was different for every single row, but, it was only different for rows with the same ref. a quite an interesting behavior for an auto_increment column i thought and consulted the mysql manual. i found out that if an auto_increment column is a secondary value of an index the behavior is like this. if the last row of such a table is deleted it is reused later. but it works only if the auto_increment column is not the primary column in another index. in this case this index would generate the auto_increment values and it would be unique over all rows.

md5 hash function

md5 is a cryptographic hash funtion. the input is a some data, a text string or whatever. the output is a fixed length byte array. it is impossible to reconstruct the original data from the output data. if the input gets changed just a tiny bit the output is completely different. java supports multible hash algorithms, one of the not so secure but often used is md5. to create an md5 hash of a string you need about following code: import java.security.MessageDigest; .... // here we store the output as a hex encoded string StringBuilder builder = new StringBuilder(); MessageDigest md5 = MessageDigest.getInstance("MD5"); // we use the string "HELLO" as input md5.update("HELLO".getBytes()); // now we have to iterate over the output bytearray for (byte b : md5.digest()) { // make an int out of the byte. the // & 0xff is to remove the sign bit int i = b & 0xff; if (i < 0x10) { // if it's less than 16, add a 0 for padding builder.append("0"); } builder.append(Integer.toHexString(i)); } // print the finished hex string System.out.println(builder.toString()); the output should be eb61eead90e3b899c6bcbe27ac581660 if you need only a single hash you can use an online hash calculator. this one supports not only md5, he also supports older, less secure algorithms and newer more secure algorithms.

apache and tomcat on debian

this is a little manual to get a debian etch server with apache2 and tomcat5.5 running.

the software you need

for tomcat you need a jdk. the one from sun is easy to install and stable. because it is in non-free you need to change your /etc/apt/sources.list file. after that it should look about: deb http://ftp.ch.debian.org/debian/ etch main non-free deb http://security.debian.org/ etch/updates main contrib non-free it usually should only be necessary to add the non-free tags. after changing the file you need to update your package-index with: apt-get update we need apache2, jdk, tomcat5.5 and, for the connection between tomcat and apache, mod_jk. to install them: apt-get install apache2 sun-java5-jdk tomcat5.5 libapache2-mod-jk now you should have a “it works” page from apache with the url http://localhost the tomcat you may need to start /etc/init.d/tomcat5.5 start after that you should get a blank page with http://localhost:8180 the blank page because there are no webapps installed yet.

tomcat configuration

we need no fancy configuration stuff, no cluster, no tomcat manager, so we can reduce the whole /etc/tomcat5.5/server.xml config file to: <Server port="8005" shutdown="SHUTDOWN"> <Service name="Catalina"> <Connector port="8180" maxHttpHeaderSize="8192" maxThreads="150" minSpareThreads="25" maxSpareThreads="75" enableLookups="false" redirectPort="8443" acceptCount="100" connectionTimeout="20000" disableUploadTimeout="true" /> <Connector port="8009" enableLookups="false" redirectPort="8443" protocol="AJP/1.3" /> <Engine name="Catalina" defaultHost="localhost"> <Host name="localhost" appBase="webapps" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false"> </Host> </Engine> </Service> </Server> you can also leave it like it was, it should make no difference. next we will set up an example application. for that we will set up a “virtual host” so we can connect with a domain name. for example we take the domain example.com. at first we create a directory named example.com in /var/www. we also set the owner of example.com to tomcat55 because tomcat needs to write to this directory to deploy war files. mkdir /var/www/example.com chown tomcat55 /var/www/example.com in the next step we will set up the virtual server in tomcat. in the /etc/tomcat5.5/server.xml file we need to add a Host element to the Engine element. the Host element should look like: <Host name="example.com" appBase="/var/www/example.com" unpackWARs="true" autoDeploy="true" xmlValidation="false" xmlNamespaceAware="false"> </Host> after that we need to restart tomcat again /etc/init.d/tomcat5.5 restart to test this, we can deploy an example war file. there is one downloadble from an apache server. we just need to get in the exampl.com directory and download it with wget. we save it as ROOT.war because this creates a the root webapp, the webapp, which is reachable directly with no aditional context path: cd /www/var/example.com wget -O ROOT.war http://tomcat.apache.org/tomcat-5.5-doc/appdev/sample/sample.war after that we should be able to view the sample page with the url http://example.com:8180. but for that we need to let point the domain example.com to the proper server. to do that you need to add the line 192.168.1.100 example.com into the /etc/hosts file from the client where you are trying to connect to the server. the ip can be 127.0.0.1 when you are trying to connect from the server itself, otherwise it must be the ip of the server. the page of http://example.com:8180 should look like an example “hello world!” page.

configure apache

now tomcat is running but we only can access tomcat files trough port 8180. we sure want to access these pages trough the default http port: 80. this port is already in use by the apache server. for this we need mod_jk. this is a kind of a proxy which requests files for some defined url patterns from tomcat and sends them back to the client. for that we need ad first to add a virtual host to the apache config. create a file named example.com in /etc/apache2/sites-available: <VirtualHost *> ServerAdmin your@email.com DocumentRoot /var/www/example.com/ROOT ServerName example.com ErrorLog /var/log/apache2/example.com.error.log CustomLog /var/log/apache2/example.com.access.log combined <Directory /var/www/example.com/ROOT> Options Indexes </Directory> <LocationMatch "/(WEB-INF|META-INF)/"> Order allow,deny Deny from all </LocationMatch> </VirtualHost> to activate the virtual host create a symbolic link of example.com in /etc/apache2/sites-enabled: ln -s /etc/apache2/sites-available/example.com /etc/apache2/sites-enabled/example.com and restart apache /etc/init.d/apache2 restart after that go to the url http://example.com now you should see the same page as before with the port 8180. it is the same page but it is served by the apache server and not by tomcat. there are two links in this page, the one goes to a file called hello.jsp. this jsp page should be interpreted by tomcat but it isn’t. we see the sourcecode. the secand link, /hello is a servlet and it isn’t served by tomcat to. for that we have to add two rules to the example.com apache config: JkMount /hello ajp13_worker JkMount /*.jsp ajp13_worker now this two patterns, the path hello and all paths ending with .jsp ashould be forwarded to tomcat. but wee ned to setup mod_jk properly at first.

mod_jk to glue them together

ther is already an mod_jk configuration, but it’s like almost all config files much to complicated. the config file is/etc/libapache2-mod-jk/workers.properties and should roughly contain: worker.list=ajp13_worker worker.ajp13_worker.port=8009 worker.ajp13_worker.host=localhost worker.ajp13_worker.type=ajp13 now ajp_13 is properly configured. the module is loaded by apache2 but apache doesn’t know where the config file ist. for that we create a file called jk.conf in /etc/apache2/mods_available: JkWorkersFile /etc/libapache2-mod-jk/workers.properties JkShmFile /var/run/apache2/jk-runtime-status JkLogFile /var/log/apache2/mod_jk.log JkLogLevel info now we need a smbolic link to this file inside /etc/apache2/mods-enabled: ln -s /etc/apache2/mods-available/jk.conf /etc/apache2/mods-enabled/jk.conf then restart apache again and it should work.

clean up the mess

if it works you can clean up, document and harden the whole config. for example you should kick out the http connector in tomcat (port 8180). because we can connect trough apache we don’t need this connector anymore. each closed port on a server improves security a bit. there are many configurations to improve security and performance. it’s important to tweak them and adapt them to the servers needs. but thats your task. it might be that the config i used here is very poorly, concerning security and performance.

http basic authentication in java

sometimes it’s necessary to access webservices or websites out of a java programm. if the server uses basic authentication you need to provide the username and password to the connection. because URLConnection doesn’t provide a simple method to set these you have to do it manually. http uses the Authentication header to transmit authentication informations. for basic authentication this header looks like: Authorization: Basic dXNlcjpwYXNzd29yZA== the string dXNlcjpwYXNzd29yZA== is the username and password, encoded as base 64. the unencoded string is user:login with java you can use the setRequestProperty method of a URLConnection to add the header: URLConnection uc = new URL("http://test.url").openConnection(); uc.setRequestProperty("Authorization", "Basic dXNlcjpwYXNzd29yZA=="); uc.connect(); instead of the dXNlcjpwYXNzd29yZA== string you need your own base64 encoded login and password. if it doesn’t change during runtime it’s best to preencode it and store it in a configfile. you can use an online base64 encoder to encode it.