SIM 2.19 is available as C sources and as MSDOS binaries. It is also available through ftp; the directory is ftp.cs.vu.nl:/pub/dick/similarity_tester. There is a Unix-style manual page.
The software similarity tester is very efficient and allows us to compare this year's students' work with that collected from many past years (much to the dismay of some, mostly non-CS, students). Students are told that their work is going to be compared, but some are non-believers ...
The output of the similarity tester can be processed by a number of shell scripts by Matty Huntjens (matty@cs.vu.nl). These shell scripts take sim output and produce lists of suspect submissions, histograms and the like. The present version of these scripts is very much geared to the local situation at the VU University Amsterdam, though; they are low on portability.
We are not afraid that students would try to tune their work to the similarity tester. We reckon if they can do that they can also do the exercise.
Since this piece of handicraft does not qualify as research, there are no international papers on it. The work was described in Dutch in Dick Grune, Matty Huntjens, Het detecteren van kopieën bij informatica-practica, Informatie, 31, 11, Nov 1989, pp. 864-867 ( lit. ref.)). An English translation of the paper is also available. The ftp directory contains a terse technical report about the internal workings of the program.