Archive for the ‘java’ Category

python, i like the snake

Monday, March 2nd, 2009
90% of my development work during the last few years was done in java. i had to create a little project for the OLPC about a year ago. OLPC is using python very heavily and so i developed the tool in python. before that, i did only a few very simple things in python. a few weeks ago i started to develop webapplications in python. here are a few ideas and a list of packages/ software i used.

python versus java

compared with java, python is IMHO not as clean. it isn’t as “designed” as java is. but compared with, for example PHP, it’s a hyper clean and intuitive language. java is static typed and although also bytecode compiled like python, it needs an explicit compilation step. this makes developing python programs a bit more agile. for java and python there are lots of libraries/ packages available. as much as i have seen the python ones seems to be usually higher quality, but this can just be the selection i made and doesn’t need to be a general thing. the most important point in using python for webapps instead of java is that the python programs are usually fewer lines of code compared to the java programs. as much as i have seen till now i think this is also possible while having at least the same readability of the code like java programs. on the other hand you can write more easyli horrible code in python than in java. so python needs a bit more discipline. i will use python mainly for little not so critical projects, at least until i am fluent with it.

what did i use for webapps

with java i usually used an apache with mod_jk to connect to tomcat. i looked for a similar setup in python and found that there are a lot of possible ways but the most promising thing was the WSGI standard. there was a module for apache2 and so i was happy. WSGI is short of WebServer Gateway Interface and it sits somewhere between the webserver and python. compared with java where a application server is used who shares a common context for all servlets/ requests, WSGI-python scripts are loaded in a context for each process (apache processes i think), so caching and communication with other requests is not possible. but, if you are used to develop your applications with a cluster in mind it doesn’t hurt. you can use memcached for a cache.

because WSGI is a very low level interface it’s best to use a proper webframework. there are lots of them. the most are full blown frameworks and i am a bit allergic to them, i like it when i can put together the tools i want and doesn’t have to be pushed to use a certain way (forced, you are never. with full blown frameworks it’s the same as with certain girlfriends. they tend to give subtle hints about something but don’t force you, they are open to everyting but you get afraid what awaits you if you really go with something else then her preferred way). the only one i kind of liked was web.py. i’ts not the most beautiful but well built and it lets you do whatever you want if you like to. i use it mainly for mapping urls to code and webstuff like redirecting, send errors and stuff.

to access a database there are different packages available. i checked for orm mappers and found quite a lot of different implementations. the two best (as much or less as i can judge it) are Storm and SQLAlchemy. both are flexible, easy to use and have lots of functionality who doesn’t get in your way if not used. compared with hibernate it’s a dream.

to render html or whatever, i was looking for a good template language. there are many of them and i decided to use Jinja2. its the one i liked most but, hey it’s a template language, they are almost all usefull, properly built and support about the same featureset.

if i remember properly, thats all i needed, at least for the web aspect of the applications. python also has many very useful modules included, therefore you don’t need to include lots of modules. no crappy apache commons jars who give some functionality they missed to implement in the standard library.

URLConnection and https

Saturday, January 31st, 2009
with a java.net.URLConnection i can connect to any http server. it’s also possible to connect to an https server. if i connect to a https server with a browser i might get a message that the certificate is not trusted. i am prompted to examine the certificate and mark it as a trusted certificate. after that i can connect without any problems. the same must be done if i try to connect with an URLConnection. if we try to connect to an https server via URLConnection and the certificate is not trusted a javax.net.ssl.SSLHandshakeException is thrown with the message “PKIX path building failed”… at least for the sun jvm version 1.5.

add certificate to a KeyStore

first we need to download the certificate from the webserver. this can be done with firefox. if you accepted the servers certificate you can save the certificate by selecting: Edit->Preferences->Advanced->Encryption->View Certificates->Your Certificates here you need to select the certificate and then click on export. save it somewhere on your harddisk. with this certificate java cannot work directly… actually it can but it’s easier to transform it into a KeyStore file. with the command keytool -import -alias aliasOfCertifiate -file certificateFile.cer\ -keystore myKeystore the keytool program is distributed with a jdk. with the command we add the certificate certificateFile.cer as a trusted certificate to the keystore file named myKeystore. the tool prompts for a password. this password is used to encrypt the keystore file.
instead of adding the certificate to myKeystore we could also add it to the default keystore of the jvm. this is done with: keytool -import -alias aliasOfCertifiate -file certificateFile.cer\ -keystore $JAVA_HOME/lib/security/cacerts with the password “changeit”. this uses root privileges and it is the default setting of all java programs. it’s a bit like pollution of the “global” environment and it’s better to avoid this.

use that keystore

if i have an URLConnection with https as a protocol it’s an instance of HttpsURLConnection and i can simply cast to it. HttpsURLConnection has a method setSSLSocketFactory. this socketFactary can be configured to accept certain certificates or not. a socketFactory which accepts certificates in myKeystore can be created with the following code: InputStream in = new FileInputStream(new File("path/to/myKeystore")); KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType()); ks.load(in, "PasswordUsedWithKeytool".toCharArray()); in.close(); TrustManagerFactory tmf = TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm()); tmf.init(ks); X509TrustManager defaultTrustManager = (X509TrustManager)tmf.getTrustManagers()[0]; SSLContext context = SSLContext.getInstance(”TLS”); context.init(null, new TrustManager[] {defaultTrustManager}, null); SSLSocketFactory sslSocketFactory = context.getSocketFactory(); here the keystore is loaded at first. you have to provide the password you typed in during creation of the keystore file. after that a TrustManager is created via a TrustManagerFactory initialised with our KeyStore. then the SSLContext is created and initialised with the trustManager. after that a SSLSocketFactory can be created by the getSocketFactory method of the SSLContext. we can use it for our URLConnection like following: URL url = new URL("https://thesecuredomain.org"); URLConnection con = url.openConnection(); ((HttpsURLConnection) con).setSSLSocketFactory(sslSocketFactory); con.connect(); in = con.getInputStream(); ...

jj1 webservice step by step

Friday, January 16th, 2009
we will create a little json-rpc webservice in java using jj1. it’s a step by step tutorial using a fresh install of tomcat and java. if you use an ide it should be easy to adabt it. the webservice will generate an ascii banner from a string. the code to generate the ascii banner exists already and just needs to be made accessible as a webservice.

what you need

you need a default tomcat installation and a jvm (java virtula machine). to test if a jvm is available just type java at the command prompt. if something like java version "1.6.0_07" Java(TM) SE Runtime Environment (build 1.6.0_07-b06) Java HotSpot(TM) Server VM (build 10.0-b23, mixed mode) apears you have one installed. otherwise download one from sun or use your package manager to install one. the easiest way to install tomcat is with your package manager if you use a linux like os. if there is no tomcat support take a look at the tomcat setup page.

directories

i usually create a main folder named after the project. in this folder there is a src folder for the java source, a bin folder for the compiled files and a dist folders for jars and wars. if it’s a webapp there is also a web folder which is the webapp. so for our little example app it looks like: asciiText asciiText/src asciiText/bin asciiText/dist asciiText/web asciiText/web/WEB-INF asciiText/web/WEB-INF/lib

dependencies

we need three jars. at first the jj1 jar. the next jar is the stringtree jar. this is used to encode/ decode json. as next we need the service implementation. save all the jars in the asciiText/web/WEB-INF/lib folder.

create the context object

to access a method via json-rpc we have to expose this method to the web. in jj1 this is done via a context object. this is our first… and only java class. we will create it in a ch/kerbtier/asciitext subfolder of the src folder. the ch/kerbtier/asciiws folders represent the namespace or package where the class lives in. the file should be named AsciiContext and look like: package ch.kerbtier.asciiws; import ch.kerbtier.asciitext.AsciiRenderer; import com.googlecode.jj1.server.JsonRpc; public class AsciiContext{ private AsciiRenderer renderer = new AsciiRenderer(); @JsonRpc public String getText(String input, String font, int size){ return renderer.createAscii(input, font, size); } } thats it. we publish the method getText (with the JsonRpc annotation) as a webservice. internally it just calls the createAscii method of the renderer.

web.xml

a webapplication is configured by a web.xml file which lives in the WEB-INF folder of a webapp. we need the following file: <?xml version="1.0" encoding="UTF-8"?> <web-app> <servlet> <servlet-name>AsciiText</servlet-name> <servlet-class>com.googlecode.jj1.server.Jj1Servlet</servlet-class> <init-param> <param-name>services</param-name> <param-value>root=ch.kerbtier.asciiws.AsciiContext</param-value> </init-param> </servlet> <servlet-mapping> <servlet-name>AsciiText</servlet-name> <url-pattern>/ascii</url-pattern> </servlet-mapping> </web-app> this instantiates a Jj1Servlet and loads an AsciiContext instance as a jj1 context and publishes its methods directly under the url ascii.

build the whole stuff

go to your asciiText directory and type: javac -d bin -classpath web/WEB-INF/lib/asciiGenerator.jar:\
web/WEB-INF/lib/jj1.0.1.jar src/ch/kerbtier/asciiws/AsciiContext.java
this compiles the AsciiContext file and places it into your bin directory into the proper package. with the -classpath option you specify the jar files with the classes inside AsciiContext depends on. on windows you need a ; as delimiter between classpath entries. jar cf dist/asciiws.jar -C bin ch this creates a jar file out of the class… it’s useful if you have lots of classes, with one it’s just habit. cp dist/asciiws.jar web/WEB-INF/lib/ copies the generated jar file into the lib folder of the webapp. jar cf dist/asciiws.war -C web WEB-INF this creates the file asciiws.war. now we just need to deploy it with tomcat. one easy way is to just copy it into the webapps folder. after that the webservice should be accessible trough the url http://localhost:8080/asciiws/ascii

levenshtein to slow, how to speed it up

Tuesday, December 30th, 2008
for a little project i need to compare a string against a large set of strings. it should not only match the exact strings, it also should match strings which are similar. to find out if two strings are similar there exists an algorithm called “levenshtein”. it takes two strings as an argument and returns the distance between these strings. if the distance is zero the strings are equal. the bigger the distance the more the strings differ.

to slow

i use it to compare strings which are about 200 chars long and there are at the moment 40′000 strings. to compare one string to the existing set i need to call the levenshtein algorithm 40′000 times. because the algorithm itself is not super fast it takes a long time to do the comparison. i took an implementation from Levenshtein: Java Implementation to test and i might get this implementation faster, if i write an optimized version, but i doubt that i get out much. the problem is the 40′000 calls to it.

you don’t need 40′000 levenshtein calls

one attempt to make it faster is to rule out some strings by comparing lengths first. if you say your maximum distance for two strings to match is 10 then you can discard strings which difference in character count is bigger than 10. it’s already much faster, for the 40′000 and for my usage almost fast enough but the faster the better. in this case i still have to check one strings length against 40′000 other strings length. this can be speed up by putting the sets of the strings with the same length into an array where the index is the length of the sets strings. so i don’t have to compare it against all strings. if l is the length of the string i want to test then i need to test it only against the strings in the sets for the indexes with a length from l-10 to l+10. probably i can rule out with this technique even more strings, by example count the number of words in the strings instead of the number of characters or the number of vowels. these approaches could be combined together and it probably will speed it up another bit. but due to statistics the result would probably be about the same like in the case where i use the total length of the string.

how much faster is it?

i measured the time by feeling and don’t know how fast it is exactly. with the string length approach i needed to test only about 85% of the strings against each other. this is faster but still 6′000 hits. if the set of strings to test against gets bigger i’ll soon have the same problem. if i include the other two approaches or similar one i might get it maybe to 90% or even 95% but this still wont help if the set is getting ten or a hundred times bigger.

a better approach?

a better approach would probably be to do the test against all strings in one go. for this the levenshtein algorithm has to be rewritten and i don’t know if it’s even possible. because i’m not a mathematician i might have problem with this approach but i’ll try it and post it here if it works.

md5 hash function

Sunday, December 7th, 2008
md5 is a cryptographic hash funtion. the input is a some data, a text string or whatever. the output is a fixed length byte array. it is impossible to reconstruct the original data from the output data. if the input gets changed just a tiny bit the output is completely different. java supports multible hash algorithms, one of the not so secure but often used is md5. to create an md5 hash of a string you need about following code: import java.security.MessageDigest; .... // here we store the output as a hex encoded string StringBuilder builder = new StringBuilder(); MessageDigest md5 = MessageDigest.getInstance("MD5"); // we use the string "HELLO" as input md5.update("HELLO".getBytes()); // now we have to iterate over the output bytearray for (byte b : md5.digest()) { // make an int out of the byte. the // & 0xff is to remove the sign bit int i = b & 0xff; if (i < 0x10) { // if it's less than 16, add a 0 for padding builder.append("0"); } builder.append(Integer.toHexString(i)); } // print the finished hex string System.out.println(builder.toString()); the output should be eb61eead90e3b899c6bcbe27ac581660 if you need only a single hash you can use an online hash calculator. this one supports not only md5, he also supports older, less secure algorithms and newer more secure algorithms.

http basic authentication in java

Saturday, November 22nd, 2008
sometimes it’s necessary to access webservices or websites out of a java programm. if the server uses basic authentication you need to provide the username and password to the connection. because URLConnection doesn’t provide a simple method to set these you have to do it manually. http uses the Authentication header to transmit authentication informations. for basic authentication this header looks like: Authorization: Basic dXNlcjpwYXNzd29yZA== the string dXNlcjpwYXNzd29yZA== is the username and password, encoded as base 64. the unencoded string is user:login with java you can use the setRequestProperty method of a URLConnection to add the header: URLConnection uc = new URL("http://test.url").openConnection(); uc.setRequestProperty("Authorization", "Basic dXNlcjpwYXNzd29yZA=="); uc.connect(); instead of the dXNlcjpwYXNzd29yZA== string you need your own base64 encoded login and password. if it doesn’t change during runtime it’s best to preencode it and store it in a configfile. you can use an online base64 encoder to encode it.