mirror of
https://gitlab.com/freepascal.org/fpc/source.git
synced 2025-04-07 13:48:11 +02:00
199 lines
4.5 KiB
Plaintext
199 lines
4.5 KiB
Plaintext
|
|
|
|
|
|
User Commands SIM(1)
|
|
|
|
|
|
|
|
NAME
|
|
sim - find similarities in C, Java, Pascal, Modula-2, Lisp,
|
|
Miranda or text files
|
|
|
|
SYNOPSIS
|
|
sim_c [ -[defFnpsS] -r N -w N -o F ] file ... [ / [ file ...
|
|
] ]
|
|
sim_c ...
|
|
sim_java ...
|
|
sim_pasc ...
|
|
sim_m2 ...
|
|
sim_lisp ...
|
|
sim_mira ...
|
|
sim_text ...
|
|
|
|
DESCRIPTION
|
|
Sim_c reads the C files file ... and looks for pieces of
|
|
text that are similar; two pieces of program text are simi-
|
|
lar if they only differ in layout, comment, identifiers and
|
|
the contents of numbers, strings and characters. If any
|
|
runs of sufficient length are found, they are reported on
|
|
standard output; the number of significant tokens in the run
|
|
is given between square brackets.
|
|
|
|
Sim_java does the same for Java, sim_pasc for Pascal, sim_m2
|
|
for Modula-2, sim_lisp for Lisp, and sim_mira for Miranda.
|
|
Sim_text works on arbitrary text; it is occasionally useful
|
|
on shell scripts.
|
|
|
|
The program can be used for finding copied pieces of code in
|
|
purportedly unrelated programs (with -s or -S), or for find-
|
|
ing accidentally duplicated code in larger projects (with
|
|
-f).
|
|
|
|
If a / is present between the input files, the latter are
|
|
divided into a group of "new" files (before the /) and a
|
|
group of "old" files; if there is no /, all files are "new".
|
|
Old files are never compared to each other. Since the simi-
|
|
larity tester reads the files several times, it cannot read
|
|
from standard input.
|
|
|
|
There are the following options:
|
|
|
|
-d The output is in a diff(1)-like format instead of the
|
|
default 2-column format.
|
|
|
|
-e Each file is compared to each file in isolation; this
|
|
will find all similarities between all texts involved,
|
|
regardless of duplicates.
|
|
|
|
-f Runs are restricted to pieces with balancing
|
|
parentheses, to isolate potential functions (C, Java,
|
|
|
|
|
|
|
|
Vrije Universiteit Last change: 2001/11/13 1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
User Commands SIM(1)
|
|
|
|
|
|
|
|
Pascal, Modula-2 and Lisp only).
|
|
|
|
-F The names of functions in calls are required to match
|
|
exactly (C, Java, Pascal, Modula-2 and Lisp only).
|
|
|
|
-n Similarities found are only summarized, not displayed.
|
|
|
|
-o F The output is written to the file named F.
|
|
|
|
-p The output is given in similarity percentages; see
|
|
below.
|
|
|
|
-r N The minimum run length is set to N (default is N = 24).
|
|
|
|
-s The contents of a file are not compared to itself (-s =
|
|
not self).
|
|
|
|
-S The contents of the new files are compared to the old
|
|
files only - not between themselves.
|
|
|
|
-w N The page width used is set to N columns (default is N =
|
|
80).
|
|
|
|
The -p option results in lines of the form F consists for x
|
|
% of G material meaning that x % of F's text can also be
|
|
found in G. Note that this relation is not symmetric; it is
|
|
in fact quite possible for one file to consist for 100 % of
|
|
text from another file, while the other file consists for
|
|
only 1 % of text of the first file, if their lengths differ
|
|
enough. Note also that the granularity of the recognized
|
|
text is still governed by the -r option or its default.
|
|
|
|
Care has been taken to keep all internal processes linear in
|
|
the length of the input, with the exception of the matching
|
|
process which is almost linear, using a hash table; various
|
|
other tables are used for speed-up. If, however, there is
|
|
not enough memory for the tables, they are discarded in
|
|
order of unimportance, under which conditions the algorithms
|
|
revert to their quadratic nature.
|
|
|
|
AUTHOR
|
|
Dick Grune, Vrije Universiteit, Amsterdam.
|
|
|
|
BUGS
|
|
Strong periodicity in the input text (like a table of N
|
|
almost identical lines) causes problems. Sim tries to cope
|
|
with this but cannot avoid giving appr. log N messages about
|
|
it. The best advice is still to take the offending files
|
|
out of the game.
|
|
|
|
Since it uses lex(1) on some systems, it may dump core on
|
|
any weird construction that overflows lex's internal
|
|
|
|
|
|
|
|
Vrije Universiteit Last change: 2001/11/13 2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
User Commands SIM(1)
|
|
|
|
|
|
|
|
buffers.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Vrije Universiteit Last change: 2001/11/13 3
|
|
|
|
|
|
|