mirror of
https://gitlab.com/freepascal.org/fpc/source.git
synced 2025-04-06 21:28:08 +02:00
117 lines
3.3 KiB
HTML
117 lines
3.3 KiB
HTML
<HTML>
|
|
<!-- $Id: sim.html,v 1.7 2007/08/27 09:57:35 dick Exp $ -->
|
|
<HEAD>
|
|
<TITLE>The software and text similarity tester SIM</TITLE>
|
|
</HEAD>
|
|
|
|
<BODY>
|
|
<H1>The software and text similarity tester SIM</H1>
|
|
|
|
<H2>
|
|
<A HREF="http://www.cs.vu.nl/~dick/">Dick Grune</A>
|
|
</H2>
|
|
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/README.1st">SIM</A>
|
|
tests lexical similarity in texts in C, Java, Pascal, Modula-2, Lisp, Miranda,
|
|
and natural language.
|
|
It is used
|
|
<UL>
|
|
|
|
<LI>
|
|
to detect potentially duplicated code fragments in large software
|
|
projects, in program text, in shell scripts and in documentation
|
|
</LI>
|
|
|
|
<LI>
|
|
to detect plagiarism in software projects, educational and otherwise
|
|
</LI>
|
|
|
|
</UL>
|
|
|
|
<P>
|
|
SIM 2.19 is available as
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim_2_19.shar">
|
|
C sources</A>
|
|
and as
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim_2_19.zip">
|
|
MSDOS binaries</A>.
|
|
It is also available through ftp; the directory is
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester">
|
|
ftp.cs.vu.nl:/pub/dick/similarity_tester</A>.
|
|
There is a
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/sim.pdf">
|
|
Unix-style manual page</A>.
|
|
</P>
|
|
|
|
<P>
|
|
The software similarity tester is very efficient and allows us to compare
|
|
this year's students' work with that collected from many past years (much to
|
|
the dismay of some, mostly non-CS, students).
|
|
Students are told that their work is going to be compared, but some are
|
|
non-believers ...
|
|
</P>
|
|
|
|
<P>
|
|
The output of the similarity tester can be processed by a number of shell
|
|
scripts by Matty Huntjens
|
|
(<A HREF="http://www.cs.vu.nl/~matty/">matty@cs.vu.nl</A>).
|
|
These shell scripts take sim output and produce lists of suspect submissions,
|
|
histograms and the like.
|
|
The present version of these scripts is very much geared to the local
|
|
situation at the
|
|
<A HREF="http://www.vu.nl/">VU University Amsterdam</A>,
|
|
though; they are low on portability.
|
|
</P>
|
|
|
|
<P>
|
|
We are not afraid that students would try to tune their work to the
|
|
similarity tester.
|
|
We reckon if they can do that they can also do the exercise.
|
|
</P>
|
|
|
|
<P>
|
|
Since this piece of handicraft does not qualify as research, there are no
|
|
international papers on it.
|
|
The work was described in Dutch in
|
|
Dick Grune,
|
|
Matty Huntjens,
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/publications/Het_detecteren_van_kopieen_bij_informatica-practica.ps">
|
|
Het detecteren van kopieën bij informatica-practica</A>,
|
|
Informatie,
|
|
<STRONG>31</STRONG>,
|
|
11,
|
|
Nov 1989,
|
|
pp. 864-867
|
|
(<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/artikel.lit">
|
|
lit. ref.</A>)).
|
|
An
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/Paper.ps">
|
|
English translation
|
|
</A>
|
|
of the paper is also available.
|
|
The ftp directory contains a terse
|
|
<A HREF="ftp://ftp.cs.vu.nl/pub/dick/similarity_tester/TechnReport">
|
|
technical report</A>
|
|
about the internal workings of the program.
|
|
</P>
|
|
|
|
<H5>
|
|
<HR>
|
|
[<A HREF="CVS.html">Previous</A>]
|
|
[<A HREF="mag.html">Next</A>]
|
|
[<A HREF="http://www.cs.vu.nl/~dick/dick.html">Personal Page</A>]
|
|
[<A HREF="http://www.cs.vu.nl/~dick/">Professional Page</A>]
|
|
[<A HREF="http://www.cs.vu.nl/">CS</A>]
|
|
[<A HREF="http://www.few.vu.nl/">Faculty</A>]
|
|
[<A HREF="http://www.vu.nl/">VU University Amsterdam</A>]
|
|
<HR>
|
|
</H5>
|
|
|
|
<ADDRESS>
|
|
The software and text similarity tester SIM / Dick Grune /
|
|
<A HREF="mailto:dick@cs.vu.nl">dick@cs.vu.nl</A>
|
|
</ADDRESS>
|
|
|
|
</BODY>
|
|
</HTML>
|