Professional Documents
Culture Documents
PGNTRIM5
PGNTRIM5
DOC
BACKGROUND
-------------------
User will be prompted for input and output filenames if none are in the command
tail. There are two additional (OPTIONAL) command tail parameters:
1. /MPL:1 to /MPL:7 determines the output moves per line
/MPL:1 is nice for students to add comments for review by an instructor.
2. /ALLTAGS will preserve all [Tag records: default is most important 12 tags,
only.
PGNTRIM5.EXE is a freeware Windows utility to correct most PGN syntax errors, and
to
direct games which need human review into a separate output file named
BADTRIM5.BAD. That
file usually furnishes enough information to the user so that a decision can be
made to
correct the input file and run again, or to accept the number of games rejected
from the
new output file. PGNTRIM5 never changes the original input file.
PGNTRIM5 does NOT detect illegal/impossible moves; but pgnscid or normal32 will
catch these.
I run pgntrim5 first against newly downloaded PGN files in order to clean up common
syntax
problems and ommisions, then I run the output file into pgnscid or normal32 to
catch any
illegal moves. This approach greatly reduces the amount of time needed to edit the
PGN
file for syntax errors before placing it into a database.
After 35 years in the computer systems business, I can say with certainty that I
have never
seen a standard that was followed by everyone. Or as the saying goes: "It is
wonderful
to have a standard, and there are so many nice ones to choose from!" In the case
of PGN,
there really are not many competing standards, but the problem of human errors in
remembering syntax (as when two knights can move to the same square...called
ambiguity),
entering an f instead of a g, or simply omitting items by oversight such as
omitting part of a move... especially omitting the "x" in captures such as entering
cd4 instead of cxd4 cause many difficulties. It is all too common for a viewing
program to find an "illegal move", a
self-capture, an impossible move (Nbg4 is my favorite), etc. Most pawn promotions
are
to a Queen, Knight, or Rook, but to a Bishop is legal, and can result in more than
one
"light-squared Bishop), and hence they would also need ambiguity coding.
People have been submitting "annofritzed" pgn games to internet websites. They
often like
to add their own comments and alternate moves. Such heavily commented games
frequently
reach more than 8,000 characters in length...and all too often contain unbalanced
alternate move tokens (..)..) for which nesting IS permitted, or they may contain
extra curly brace tokens {..} delimiting comments. Fritz will produce correct PGN
syntax when autofritzing, but humans seem driven to "improve or clarify" these and
they
frequently end up with these tokens unbalanced. These lengthy games are sent to the
BadTrim5.Bad so that they can receive human review.
Recently, PGN games have been appearing on the internet which are Fischer-Random
games.
This new form of chess is also called Chess960 because there are 960 possible ways
to
arrange pieces in the back rank as the game begins.
Chess magazines and books are not immune from typograhical errors and omissions
such as leaving out moves entirely, leaving pieces off the diagrams, having two
black
Kings and no White King, displaying entirely the wrong diagram, etc.
Persons collecting PGN chess game records do not want to end up with such problems
that show up while a game is being studied! Normalization programs can detect most
PGN problems, fix some, and tell the user about the others so that they can be
manually edited, or the game discarded.
A PGN game recording example follows. Heading records are called "tags", and seven
of
them are required as a minimum....the first 7 shown below are required in any PGN
game.
Other tags are optional such as the "Opening" and "ECO" tags shown. All tags must
conform
to standard in order to be useful to a wide audience...Each tag must begin with
[ and end
with ], and the tag name must begin with one uppercase letter, the text must be
enclosed
within quotation marks, etc. It is somewhat surprising just how many PGN games
have
simple syntax errors in the tag records!
PGNTRIM5 will detect unbalanced (...) alternate move delimiters, and unbalanced
{...}
comment delimiters. Balanced alternate moves and comments (including nested
comments, which are legal) will be removed by PGNTRIM5. Unbalanced pairs of these
will
be rejected by PGNTRIM5, and the game sent to the BADTRIM5.BAD text file for
review.
Refer to the TEST.PGN file which accompanies distribution of PGNTRIM5. This may be
run through PGNTRIM5 to show the breadth of syntax errors it will correct.
Normalization programs detect deviations from standard, and either fix the problem,
notify
the user, or both. Missing tags, illegal moves or incomplete moves such as B7, a8,
or Rx
can not be fixed and are simply reorted to the user for editing or discarding the
game.
Avoid using a semicolon ANYWHERE in a PGN game file; the standard intends this to
signal
that the rest of the line is comments, and not essential. Semicolons within tags
cause
a tag syntax error. "Handling" semicolons within tags introduces other problems for
the
normalization programmer.
By normalizing PGN files before inserting them into you databases, you will improve
your
enjoyment considerably by avoiding syntax error interruptions.
[ECO "B01"]
Windows will treat the first one as end-of-file regardless of the length specified
in the directory. XEOFF.EXE will safely remove the extra end-of-file characters,
and it will remove all form-feed characters in that file. There is no need for
form feeds in a PGN file.
PGNTRIM5.EXE
--------------------
If you are collecting PGN files from internet sources, you will find PGNTRIM5
very useful in saving you much time and adding to your enjoyment when viewing
games with a viewer or database such as SCID, Bookup, or Chessbase. While it
does not catch impossible or illegal moves, it will catch a vast majority of
errors which frequently occur in PGN data, and it will correct most of those
syntax errors, and some ommissions such as [Round ""] into [Round "?"]. If
any of the seven required tag records are missing or illegible, then the game
is sent into the rejections file BADTRIM5.BAD.
Games which can not be fixed are placed into a separate output file for editing.
The original input file is NEVER changed by PGNTRIM5, a separate output file of
normalized PGN is always created (unless all the games go into the reject file)!
Game images in BADTRIM5.BAD contain an additional tag record titled [Warning
which provides the reason for placing the game image into the reject file. If
you edit the rejections file directly, you may delete these [Warning tags.
Processing continues as a rule, but if a game is unusually stinky, then the
game being processed when the fatal error occurred is written to fatalerr.pgn.
A game sent into fatalerr.pgn should be located within the original input file
and edited there, or removed, before reprocessing the input file.
PGNTRIM5 no longer uses a settings file (as did previous versions) since
permitting curly-brace comments, alternate move sequences, etc. greatly
compounded the programming difficulty of trying to fix syntax errors. The
desired number of moves per line in the output file, however, may still be
specified as a command line argument (or parameter) such as the example:
pgnTRIM5 infile1.pgn outfile.pgn /MPL:5 to obtain 5 moves per output line.
PGNTRIM5 will remove all {...} and (...) and ? and ! and $nnn commentary
data due to the high percentage of syntactical errors accompanying these.
If you want those comments, keep a copy of the input file as another file
name such as 04Linar.sav or 04Linar.cmt or some such. PGNTRIM5 never changes
the input file, but you might do so if you remove error games, or otherwise
edit the file yourself....so have a naming scheme to save files you want to
be able to return to. 04Linar.001 04Linar.002 etc. will accomplish this.
Suggestion, I use .TRM (for trim) as the output filetype.
For example, pgntrim5 Kasparov.pgn kasparov.trm
...then if the .BAD reject file contains little or nothing from this,
I rename the original file as with ren Kasparov.pgn Kasparov.sav
and then copy Kasparov.trm Kasparov.PGN /y
and then del Kasparov.trm
There is no limit as to the size of the input or output files provided you have
enough disk space for the output file. I have passed 43 million records
through PGNTRIM5 in one pass, however files that large are not convenient
to edit, sort, or otherwise work with. But if you have a huge PGN file, and you
wish to normalize it all at once, you may do so with PGNTRIM5.
---------------------------------------------------------