mirror of
https://gitlab.com/freepascal.org/fpc/source.git
synced 2025-04-07 18:08:17 +02:00
581 lines
23 KiB
Plaintext
581 lines
23 KiB
Plaintext
2007-08-23 Dick Grune <dick@hydra.cs.vu.nl>
|
|
LICENSE.txt added.
|
|
|
|
2006-11-27 Dick Grune <dick@hydra.cs.vu.nl>
|
|
Removal of setbuff() for compatibility.
|
|
|
|
2005-01-17 Dick Grune <dick@blade014.cs.vu.nl>
|
|
Corrections by Jerry James <james@eecs.ku.edu>; ANSIizing, etc.
|
|
|
|
2004-08-05 Dick Grune <dick@blade014.cs.vu.nl>
|
|
Finished the 'percentage' option.
|
|
|
|
08-Nov-2001 Dick Grune
|
|
Begun to add a 'percentage' option, which will express the
|
|
similarity between two files in percents.
|
|
|
|
27-Sep-2001 Dick Grune
|
|
Split add_run() off from compare.c into add_run.c, to accomodate
|
|
different add_run()s, for different types of processing.
|
|
|
|
27-Nov-1998 Dick Grune
|
|
Installed a Miranda version supplied by Emma Norling (ejn@cs.mu.oz.au)
|
|
|
|
23-Feb-1998 Dick Grune
|
|
Renamed text.l to textlang.l for uniformity and to make room for
|
|
a possible module text.[ch].
|
|
|
|
Isolated a module for handling the token array from buff.[ch] to
|
|
tokenarray.[ch], and renamed buff.[ch] to text.[ch].
|
|
|
|
23-Feb-1998 Dick Grune
|
|
There is probably not much point in abandoning the nl_buff list
|
|
when running out of memory for TokenArray[]: each token costs 1
|
|
byte for the token and 4 bytes for the entry in
|
|
forward_references[], a total of 5 bytes. There are about 3
|
|
tokens to a line, together requiring 15 bytes, plus 1 byte in
|
|
nl_buff yields 16 bytes. So releasing nl_buff frees only 1/16 =
|
|
6.7 % of memeory.
|
|
|
|
Since the code is a bother, I removed it. Note that nl_buff is
|
|
still abandoned when the number of tokens in a line does not fit
|
|
in one unsigned char (but that is not very likely to happen).
|
|
|
|
|
|
21-Feb-1998 Dick Grune
|
|
Printing got into an infinite loop when the last line of the
|
|
input was not terminated by a newline AND contained tokens that
|
|
were included in a matching run.
|
|
This was due to a double bug: 1. the non-terminated line was not
|
|
registered properly in NextTextTokenObtained() / CloseText(),
|
|
and 2. the loop in pass 2 which sets the values of
|
|
pos->ps_nl_cnt was terminated prematurely when the file turned
|
|
out to be shorter than the list of pos-es indicated.
|
|
Both bugs were corrected, the first by supplying an extra
|
|
newline in CloseText() when one is found missing, and the second
|
|
by rewriting the list-parallel loop in pass 2.
|
|
|
|
02-Feb-1998 Dick Grune
|
|
Pascal does not differentiate between strings and characters
|
|
(strings of one character); this difference has been removed
|
|
from pascallang.l.
|
|
|
|
22-Jan-1998 Dick Grune
|
|
Detection of non-ASCII characters added. Since the lexical
|
|
analyser itself generates non-ASCII characters, the test must occur
|
|
earlier. We could replace the input routine of lex by a
|
|
checking routine, but with several lex-es going around, we want
|
|
a more lex-independent solution. To allow each language its own
|
|
restrictions about non-ASCII characters, the check is
|
|
implemented in the *lang.l files.
|
|
|
|
28-Nov-1997 Dick Grune
|
|
Changed the name of the C similarity tester 'sim' to 'sim_c', for
|
|
uniformity with sim_java, etc.
|
|
|
|
23-Nov-1997 Dick Grune
|
|
Java version finished; checked by Matty Huntjens and crew.
|
|
|
|
24-Jun-1997 Dick Grune
|
|
Started on a Java version, by copying the C version.
|
|
|
|
22-Jun-1997 Dick Grune
|
|
Modern lexical analysers, among which flex, read the entire input into
|
|
a buffer before they issue the first token. As a result, ftell() no
|
|
longer gives a usable indication of the position of a token in a file.
|
|
This pulls the rug from under the nl_buff mechanism in buff.c, which
|
|
is removed. We loose a valuable optimization this way, but there just
|
|
seems to be no way to keep it.
|
|
|
|
Note that this has nothing to do with the problem in MS-DOS of
|
|
character count and fseek position not being synchronized. That
|
|
problem has been solved on June 14, 1991 (which see) and the code has
|
|
been running OK since.
|
|
|
|
18-Jun-1997 Dick Grune
|
|
The thought has occurred to use McCreight's linear longest common
|
|
substring algorithm rather than the existing algorithm, which has a
|
|
small quadratic component. There are a couple of problems with this:
|
|
1. We need the longest >non-overlapping< common substring;
|
|
McCreight provides just the longest. It is not at all clear
|
|
how to modify the algorithm.
|
|
2. Once we have found our LCS, we want to find the
|
|
one-but-longest; it is far from obvious how to do that in
|
|
McCreight's algorithm.
|
|
3. Once we have found our LCS, we want to take one of its
|
|
copies out of the game, to suppress duplicate messages.
|
|
Again, it is difficult to see how to do that, without
|
|
redoing all the calculations.
|
|
4. McCreight's algorithm seems to require about two binary
|
|
tree nodes per token, say 8 bytes, which is double we
|
|
use now.
|
|
|
|
17-Jun-1997 Dick Grune
|
|
Did some experimenting with the hash function; it is still
|
|
pretty bad: the simple-minded second sweep through
|
|
forward_references easily removes another 80-99% of false hits.
|
|
Next, a third sweep that does a full comparison will remove another
|
|
large percentage.
|
|
|
|
So I have left in the second sweep in all cases.
|
|
|
|
There are a couple of questions here:
|
|
1. Can we find a better hash function, or will we forever need a
|
|
second sweep?
|
|
2. Does it actually matter, or will we loose on more expensive
|
|
hashing what we gain by having a better set of forward
|
|
references in compare.c?
|
|
|
|
|
|
16-Jun-1997 Dick Grune
|
|
Cleaned up sim.h and renamed aiso.[ch] to runs.[ch] since they
|
|
are instantiations of the aiso module concerned with runs.
|
|
Aiso.[spc|bdy] stays aiso.[spc|bdy], of course.
|
|
|
|
16-Jun-1997 Dick Grune
|
|
Redid largest_function() in algollike.c.
|
|
Corrected bug in CheckRun; it now always removes NonFinals from
|
|
the end, even when it has first applied largest_function().
|
|
|
|
15-Jun-1997 Dick Grune
|
|
Reorganized the layers around the input file. There were and
|
|
still are three layers: lang, stream and buff.
|
|
|
|
Since the lex_X variables are hoisted unchanged through the levels
|
|
lang, stream, and buff, to be used by pass1, pass2, etc., they
|
|
have to be placed in a module of their own.
|
|
|
|
The token-providing module 'lang' has three interfaces:
|
|
- lang.h, which provides access to the lowest-level token
|
|
routines, to be used by the next level.
|
|
- lex.h, which provides the lex variables, to be used by
|
|
all and sundry.
|
|
- language.h, which provides language-specific info about
|
|
tokens, concerning their suitability as initial
|
|
and final tokens, to be used by higher levels.
|
|
|
|
This structure is not satisfactory, but it is also unreasonable
|
|
to combine them in one interface.
|
|
|
|
There is no single lang.c; rather it is represented by the
|
|
various Xlang.c files generated from the Xlang.l files.
|
|
|
|
14-Jun-1997 Dick Grune
|
|
Added a Makefile zip entry to parallel the shar entry.
|
|
|
|
13-Jun-1997 Dick Grune
|
|
A number of simplifications, in view of better software and bigger
|
|
machines:
|
|
- Removed good_realloc from hash.c; I don't think there are
|
|
any bad reallocs left.
|
|
- Removed the option to run without forward_references.
|
|
On a 16Mb machine this means you have at least 2M tokens;
|
|
using a quadratic algorithm will take 4*10^6 sec. at an
|
|
impossible rate of 1M actions/sec., which is some 50 days.
|
|
Forget it.
|
|
- Renamed lang() to print_stream(), and incorporated it in sim.c
|
|
- Removed the MSDOS subdirectory mechanism in the Makefile.
|
|
- Removed the funny and sneaky double parameter expansion in
|
|
the call of idf_in_list().
|
|
|
|
12-Jun-1997 Dick Grune
|
|
Converted to ANSI C. Removed cport.h.
|
|
|
|
09-Jan-1995 Dick Grune
|
|
Decided not to do directories: they usually contain extraneous
|
|
files and doing sim * is simple enough anyway.
|
|
|
|
09-Sep-1994 Dick Grune
|
|
Added system.h to cater for the (few) differences between Unix and DOS.
|
|
The #define int32 is also supplied there.
|
|
|
|
05-Sep-1994 Dick Grune
|
|
Added many prototype declarations using cport.h.
|
|
Added a depend entry to the Makefile.
|
|
|
|
31-Aug-1994 Dick Grune
|
|
All these changes require a 32 bit integer; introduced a #define
|
|
int32, set from the command line in the Makefile.
|
|
|
|
25-Aug-1994 Dick Grune
|
|
It turned out that one of the most often called routines was .rem,
|
|
from idf_hashed() in idf.c. Moving the % out of the loop chafed off
|
|
another 6% and reduced the time to 18.4 sec.
|
|
|
|
19-Aug-1994 Dick Grune
|
|
With very large files (e.g., concatenated /usr/man/man1/*) the fixed
|
|
built-in hash table size of 10639 is no longer satisfactory. Hash.c
|
|
now finds a prime about 8 times smaller than the text_size to use
|
|
for hash table size; this achieves optimal speed-up without gobbling
|
|
up too much memory. Reduced the time for the above file from 30.2
|
|
sec. to 19.6 sec.
|
|
For checking, the same test was run with all hashing off; it took
|
|
20h 27m 19s = 73639 sec. But it worked.
|
|
|
|
11-Aug-1994 Dick Grune
|
|
For large values of MinRunSize (>1000) a large part of the time
|
|
(>two-thirds) was spent in calculating the hash values for each
|
|
position in the input, since the cost of this calculation was
|
|
proportional to MinRunSize. We now sample a maximum of 24 tokens
|
|
from the input string to calculate the hash value, and avoid
|
|
overflow. On my workstation, this reduces the time for
|
|
sim_text -r 1000 -n /usr/man/man1/*
|
|
from 60 sec to 21 sec.
|
|
|
|
30-Jun-1992 Dick Grune,kamer R4.40,telef. 5778
|
|
There was an amazing bug in buff.c where NextTextToken() for pass 2
|
|
omitted to set lex_token to EOL when retrieving newline info from
|
|
nl_buff. Worked until now!?!
|
|
|
|
23-Sep-1991 Dick Grune
|
|
Cport.h introduced, CONST and *.spc only.
|
|
|
|
17-Sep-1991 Dick Grune
|
|
The position-sorting routine in pass2.c has been made into a
|
|
separate generic module.
|
|
|
|
14-Jun-1991 Dick Grune (dick@cs.vu.nl) at dick.cs.vu.nl
|
|
Replaced the determination of the input position through counting
|
|
input characters by calls of ftell(); this is cleaner and the other
|
|
method will never work on MSDOS.
|
|
|
|
30-May-1989 Dick Grune (dick) at dick
|
|
Replaced the old top-100 module (which had been extended to top-10000
|
|
already anyway) by the new aiso (arbitrary-in sorted-out) module.
|
|
This caused a considerable speed-up on the Mod2 test bed:
|
|
%time cumsecs #call ms/call name
|
|
17.9 99.20 7209 13.76 _InsertTop
|
|
0.3 1.37 7209 0.19 _InsertAiso
|
|
It turns out that malloc() is not a serious problem, so no special
|
|
version for the aiso module is required.
|
|
|
|
23-May-1989 Dick Grune (dick) at dick
|
|
No more uncommented comment at the end of preprocessor lines, to
|
|
conform to ANSI C.
|
|
|
|
23-May-1989 Dick Grune (dick) at dick
|
|
Added code in the X.l files to (silently) reject characters over 0200.
|
|
This does not really help, since lex stops on null chars. Ah, well.
|
|
|
|
19-May-1989 Dick Grune (dick) at dick
|
|
Made the token as handled by sim into an abstract data type, for
|
|
aesthetic reasons. Sign extension is still a problem.
|
|
|
|
03-May-1989 Dick Grune (dick) at dick
|
|
Optimized lcs() by first checking from the end if a sufficiently long
|
|
run is present; if in fact only the first 12 tokens match, chances
|
|
are good that you can reject the run right away by first testing
|
|
the 20th token, then the 19th, and so on.
|
|
|
|
21-Apr-1989 Dick Grune (dick) at dick
|
|
A run of sim_m2 finding 7209 similarities raised the question of
|
|
the appropriateness of the linear sort in sort_pos(). Profiling
|
|
showed that in this case sorting takes all of 7.5 % of the total
|
|
time. Putting the word register in in the right places in
|
|
sort_pos() lowered this number to 4.6%.
|
|
|
|
20-Apr-1989 Dick Grune (dick) at dick
|
|
Moved the test for MayBeStartOfRun() from compare.c (where it is
|
|
done again and again) to hash.c, where its effect is incorporated in
|
|
the forward reference chain.
|
|
|
|
14-Apr-1989 Dick Grune (dick) at dick
|
|
Replaced elem_of() by bit tables, headers[] and trailers[], to be
|
|
prefilled from Headers[] and Trailers[] by a call of
|
|
InitLanguage(). This saves a few percents.
|
|
|
|
13-Apr-1989 Dick Grune (dick) at dick
|
|
Implemented the -e and the -S option, by putting yet another loop
|
|
in compare.c
|
|
|
|
13-Apr-1989 Dick Grune (dick) at dick
|
|
The -- option (displaying the tokens) will now handle more than one
|
|
file.
|
|
|
|
20-Jan-1989 Dick Grune (dick) at dick
|
|
After the modification of 19-Dec-88, 12% of the time went into
|
|
updating the positions in the chunks, as they were produced by the
|
|
matching process. This matching process identifies runs (matches)
|
|
by token position, which has to be recalculated to lseek positions
|
|
and line numbers. To this end the files are read again, and for
|
|
each line all positions found were checked to see if they applied
|
|
to this line; this was a awfully stupid algorithm, but since much
|
|
more time was spent elsewhere, it did not really matter. With all
|
|
the saving below, however, it had risen to second position, after
|
|
yylook() with 35%.
|
|
|
|
Th solution was, to sort the positions in the same order in which
|
|
they would be met by the reading of the files. The process is then
|
|
linear. This required some extensive hacking in pass2.c
|
|
|
|
06-Jan-1989 Dick Grune (dick) at dick
|
|
The modification below did indeed save 25%. The newline information
|
|
is now reduced to 2 shorts; 2 chars were not enough, since some
|
|
lines are longer that 127 bytes, and a char and a short together
|
|
take as much room as two shorts.
|
|
|
|
19-Dec-1988 Dick Grune (dick) at dick
|
|
To avoid reading the files twice (which is still taking 25% of the
|
|
time), the first pass will now collect newline information for the
|
|
second pass in a buffer called nl_buff[]. This buffer, and the
|
|
original token buffer now named TokenArray[], are managed by the file
|
|
buff.c, which implements a layer between stream.h and pass?.c. This
|
|
layer provides OpenText(), NextTextToken() and CloseText(), each
|
|
with a parameter telling which pass it is.
|
|
|
|
06-Dec-1988 Dick Grune (dick) at dick
|
|
As an introduction to removing the second pass altogether, the
|
|
first and second scan were unified, i.e., their input is identical.
|
|
This also means that the call sim -[12] has now been replaced by
|
|
one call: sim --.
|
|
|
|
23-Sep-1988 Dick Grune (dick) at dick
|
|
Dynamic allocation of line buffers in pass 3. This removes the
|
|
restriction on the page width.
|
|
|
|
22-Sep-1988 Dick Grune (dick) at dick
|
|
In order to give better messages on incorrect calls to sim, the
|
|
whole option handling has been concentrated in a file option.c and
|
|
separated from the options and their messages themselves. See sim.c
|
|
|
|
07-Sep-1988 Dick Grune (dick) at dick
|
|
For long text sequences (say hundreds of thousands of tokens),
|
|
the hashing is not really efficient any more since too many
|
|
spurious matches occur. Therefore, the forward reference table is
|
|
scanned a second time, eliminating from any chain all references to
|
|
runs that do not end in the same token. For the UNIX manuals this
|
|
reduced the number of matches from 91.9% to 1.9% (of which 0.06%
|
|
were genuine).
|
|
|
|
30-Aug-1988 Dick Grune (dick) at dick
|
|
For compatibility, NextTop has been rewritten to yield true or
|
|
false and to accept a pointer to a run as a parameter.
|
|
|
|
30-Aug-1988 Dick Grune (dick) at dick
|
|
When trying to find line-number and lseek position to beginnings
|
|
and ends of runs found, the whole set of runs was scanned for each
|
|
line in each file. Now only the runs belonging to that file are
|
|
scanned; to this end another linked list has been braided through
|
|
the data structures (tx_chunk).
|
|
|
|
30-Aug-1988 Dick Grune (dick) at dick
|
|
The longest-common-substring algorithm was called much too often,
|
|
mainly because the forward references made by hashing suffered from
|
|
pollution. If you have say 1000 tokens and a hash range of say
|
|
10000, about 5 % of the hashings will be false matches, i.e. 50
|
|
matches, which is quite a lot on a natural number of 2 to 3 matches.
|
|
Improved by doing a second check in make_forw_ref().
|
|
|
|
12-Jun-1988 Dick Grune (dick) at dick
|
|
Installed a Lisp version supplied by Gertjan Akkerman.
|
|
|
|
15-Jan-1988 Dick Grune (dick) at dick
|
|
Added register declarations all over the place.
|
|
|
|
14-Jan-1988 Dick Grune (dick) at dick
|
|
It is often useful to match a piece of code exactly, especially
|
|
when function names (or, even more so, macro names) are involved.
|
|
What one would want is having all the letters in the text array,
|
|
but this is kind of hard, since each entry is one lexical item.
|
|
This means that under the -F option each letter is a lex item, and
|
|
normally each tag is a lex item; this requires two lex grammars in
|
|
one program; no good. So, on the -F flag we hash the identifier
|
|
into one lex item, which is hopefully characteristic enough. It
|
|
works.
|
|
|
|
30-Sep-1987 Dick Grune (dick) at dick
|
|
Some cosmetics.
|
|
|
|
31-Aug-1987 Dick Grune (dick) at dick
|
|
Moved the whole thing to the SUN (while testing on a VAX and a
|
|
MC68000)
|
|
|
|
16-Aug-1987 Dick Grune (dick) at dick
|
|
The test program lang.c is no longer a main program, but rather a
|
|
subroutine called in main() in sim.c, through the command line
|
|
option -1 or -2.
|
|
|
|
23-Apr-1987 Dick Grune (dick) at tjalk
|
|
Changed the name 'index' into 'elem_of', because of compatibility
|
|
problems on different Unices. Added a declaration for it in
|
|
the file algollike.c
|
|
|
|
10-Mar-1987 Dick Grune (dick) at tjalk
|
|
Changed the printing of the header of a run so that:
|
|
- long file names will no longer be truncated
|
|
- the run length is displayed
|
|
|
|
27-Jan-1987 Dick Grune (dick) at tjalk
|
|
Switched it right off again! Getting them in textual order is
|
|
still more unpleasant, since now you cannot find the important
|
|
ones if their are more than a few runs.
|
|
|
|
27-Jan-1987 Dick Grune (dick) at tjalk
|
|
Going to experiment with leaving out the sorting; just all the
|
|
runs, in the order we meet them. Should be as good or better.
|
|
Comparisons of more than 100 runs are very rare anyway, so the
|
|
fact that those over a 100 are rejected is probably no great
|
|
help. Getting them in a funny order is a nuisance, however. Down
|
|
with featurism. Just to be safe, present version saved as
|
|
870127.SV
|
|
|
|
26-Dec-1986 Dick Grune (dick) at tjalk
|
|
Names of overall parameters in params.h changed to more uniformity.
|
|
|
|
26-Dec-1986 Dick Grune (dick) at tjalk
|
|
Since the top package and the instantiation system have grown
|
|
apart so much, I have integrated the old top package into sim,
|
|
i.e., done the instantiation by hand. This removes top.g and
|
|
top.p, and will save outsiders from wondering what is going on
|
|
here.
|
|
|
|
23-Dec-1986 Dick Grune (dick) at tjalk
|
|
Use setbuf to print unbuffered while reading the files (lex core
|
|
dumps, other mishaps) and print buffered while printing the real
|
|
output (for speed).
|
|
|
|
30-Nov-1986 Dick Grune (dick) at tjalk
|
|
Various small changes in *lang.l:
|
|
; ignored conditionally (!options['f'])
|
|
new format for tokens in struct idf
|
|
cosmetics: macro Layout, macro UnsafeComChar, no \n
|
|
in character denotations, more than one char
|
|
in a char denotations in Pascal, etc.
|
|
|
|
30-Nov-1986 Dick Grune (dick) at tjalk
|
|
Added a Modula-2 version.
|
|
|
|
29-Nov-1986 Dick Grune (dick) at tjalk
|
|
Restricting tokens to the ASCII95 character set is really too
|
|
severe: some languages have many more reserved words (COBOL!).
|
|
Corrected this by adding a couple of '&0377' in strategic places.
|
|
Added a routine for printing the 8-bit beasties: show_token().
|
|
|
|
15-Aug-1986 Dick Grune (dick) at tjalk
|
|
Since the ; is superfluous in both C and Pascal, it is now ignored
|
|
by clang.l and pascallang.l
|
|
|
|
15-Aug-1986 Dick Grune (dick) at tjalk
|
|
The code in CheckRun in Xlang.l was incorrect in that it used the
|
|
wrong criterion for throwing away trailing garbage. I've taken
|
|
CheckRun etc. out of the Xlang.l-s and turned them into a module
|
|
"algollike.c". Made a cleaner interface and avoided duplication of
|
|
code.
|
|
|
|
02-Jul-1986 Dick Grune (dick) at tjalk
|
|
Looking backwards in compare.c to see if we are in the middle of a
|
|
run is an atavism. You can be and still be all right, e.g., if
|
|
part of the run was rejected as not fitting for a function.
|
|
Removed from compare.c.
|
|
|
|
10-Jun-1986 Dick Grune (dick) at tjalk
|
|
The function hash_code() in hash.c could yield a negative value;
|
|
corrected.
|
|
|
|
09-Jun-1986 Dick Grune (dick) at tjalk
|
|
Changed the name of the file text.h to sim.h. Sim.h is more
|
|
appropriate and text.h sounds as if it belongs to text.l, with
|
|
which it has no connection.
|
|
|
|
04-Jun-1986 Dick Grune (dick) at tjalk
|
|
After having looked at a couple of hash functions and having done
|
|
some calculations on the number of duplicates normally encountered
|
|
in hash functions, I conclude that our function in hash.c is quite
|
|
good. Removed all the statistics-gathering stuff.
|
|
|
|
Actually, hash_table[] is not the hash table at all; it is a
|
|
forward reference table; likewise, the real hash table was called
|
|
last[]. Renamed both.
|
|
|
|
There is a way to keep the hash table local without putting it on
|
|
the stack: use malloc().
|
|
|
|
02-Jun-1986 Dick Grune (dick) at tjalk
|
|
Added a simple lex file for text: each word is condensed into a
|
|
hash code which is mapped on the ASCII95 character set. This
|
|
turns out to be quite effective.
|
|
|
|
01-Jun-1986 Dick Grune (dick) at tjalk
|
|
The macros cput(tk) and c_eol() both have a return in them, so any
|
|
code after them may not be executed -> they have to be last in an
|
|
entry. But they weren't, in many places; I can't imagine why it
|
|
all worked nevertheless. They have been renamed return_tk(tk) and
|
|
return_eol() and the entries have been restructured.
|
|
|
|
30-May-1986 Dick Grune (dick) at tjalk
|
|
Moved the string and character entries in clang.l and pascallang.l
|
|
to a place behind the comment entries, to avoid strings (and
|
|
characters) being recognized inside comments. I first thought
|
|
this would not happen, but as Maarten pointed out, if both
|
|
interpretations have the same length, lex will take the first
|
|
entry. Now this will happen if the string occupies the whole line
|
|
that would otherwise be taken as a comment. In short,
|
|
/*
|
|
"hallo"
|
|
*/
|
|
would return ".
|
|
|
|
28-May-1986 Dick Grune (dick) at tjalk
|
|
Added -d option, to display the output in diff(1) format (courtesy
|
|
of Maarten van der Meulen).
|
|
Rewrote the lexical parsing of comments (likewise courtesy Maarten
|
|
van der Meulen).
|
|
|
|
20-May-1986 Dick Grune (dick) at tjalk
|
|
Added a routine to convert identifiers to lower case in
|
|
pascallang.l .
|
|
|
|
19-May-1986 Dick Grune (dick) at tjalk
|
|
Added -a option, to quickly check antecedent of a file (courtesy
|
|
of Maarten van der Meulen).
|
|
|
|
18-May-1986 Dick Grune (dick) at tjalk
|
|
Brought everything under RCS/CVS.
|
|
|
|
18-Mar-1986 Dick Grune (dick) at tjalk
|
|
Added modifications by Paul Bame (hp-lsd!paul@hp-labs) to have an
|
|
option -w to set the page width.
|
|
|
|
21-Feb-1986 Dick Grune (dick) at tjalk
|
|
Took array last[N_HASH] out of make_hash() in hash.c, due to stack
|
|
overflow on the Gould (reported by George Walker
|
|
tekig4!georgew@mcvax.uucp)
|
|
|
|
16-Feb-1986 Dick Grune (dick) at tjalk
|
|
Corrected some subtractions that caused unsigned ints to turn
|
|
pseudo-negative. (Reported by jaap@mcvax)
|
|
|
|
11-Jan-1986 Dick Grune (dick) at tjalk
|
|
Touched up for distribution.
|
|
|
|
10-Jan-1986 Dick Grune (dick) at tjalk
|
|
Fill_line was not called for empty lines, which caused them to be
|
|
printed as repetitions of the previous line.
|
|
|
|
24-Dec-1985 Dick Grune (dick) at tjalk
|
|
Reduced hash table to a single array of indices; it is used only
|
|
in one place, which makes it very easy to make it (the hash table)
|
|
optional. General tune-up of everything. This seems to be
|
|
another stable "final" version.
|
|
|
|
14-Dec-1985 Dick Grune (dick) at tjalk
|
|
Some experiments with hash formulas:
|
|
h = (h OP CST) + *p++ OP CST yields right wrong
|
|
* 96 - 32 205 562
|
|
* 96 - 2 205 560
|
|
* 96 205 560
|
|
* 97 205 559
|
|
<< 0 66 3128
|
|
<< 1 203 555
|
|
<< 2 205 536
|
|
<< 7 203 540
|
|
Conclusion: it doesn't matter, unless you do it wrong.
|
|
|
|
01-Oct-1983 Dic8k Grune (dick) at vu44
|
|
Oldest known files.
|
|
|
|
# This file is part of the software similarity tester SIM.
|
|
# Written by Dick Grune, Vrije Universiteit, Amsterdam.
|
|
# $Id: ChangeLog,v 2.12 2007/08/27 09:57:30 dick Exp $
|
|
#
|