are overlapping. I'm assuming this is due to missing a continue
or something inside the ColumnCollector. I added a couple new IsNextRecord
exceptions in response to blank rows, but this may be causing more problems
than expected. Next step is probably to check the records returned, and verify
that nothing is being duplicated. Some of the duplicates may be filtered out
by the RecordBuilder class, or during the fields filtering in the pyaccuwage-pdfparse
script (see: fields).
to track the beginning/ending points for each record and append
continuation records onto the previous. There's some issue in
the pyaccuwage-pdfparse script causing it to have problems reading
the last record field in a record group. Maybe the record extractor
needs to dump the last failed ColumnCollector rather than return it
if it's determined to hold junk data?
The record builder seems to handle everything just fine.
Added a function to the field name parsing to replace ampersands
with an "and" string so as not to cause problems with variable names.
We encountered a problem with the parser where a description contained
a range value and the parse thought it was the beginning of a new field
definition. We should be able to exclude the incorrect range values
by looking at our last good range, and if the range does not continue
the previous range, then it is probably incorrect and can be discarded.
These changes can probably be performed in the tokenize section of the
parser.