Commit graph

18 commits

Author SHA1 Message Date
66573e4d1d update for 2023 p1220 parsing, stupid irs 2024-03-29 10:48:04 -04:00
bfd43b7448 release 0.2018.2 2020-06-12 14:45:08 -05:00
431b594c1e add pyaccuwage-convert 2020-06-12 13:10:13 -05:00
b9982c3a21 added missing modeldef.py and fixed genfieldfill 2013-04-20 14:34:14 -05:00
9bbe100929 added pyaccuwage-genfieldfill 2013-04-20 14:31:09 -05:00
ef9f012bd2 added checkseq to scripts in setup.py 2013-04-20 13:03:09 -05:00
6bff5da58b pyaccuwage-checkseq now reports error lines when it encounters out-of-sequence field comments 2013-04-13 12:31:11 -05:00
c6df6c5452 Added pyaccuwage-checkseq. Everything works so far, currently
the sequence comments are returned as string tuples. Next step
is to take these results, convert them to integers, and make sure
they occur in the expected linear order.
2013-03-30 13:15:23 -05:00
afc4138898 fixed automatic model generation inheretence 2013-02-19 16:06:11 -06:00
b40e736ae0 bumping version, improving field type guessing 2013-02-19 15:55:05 -06:00
e6e087ef38 Record merging seems to work now that header offsets have been corrected.
There's an issue parsing p1220 on line 2570. Maybe making the parser ignore
full-width lines during parsing would fix the problem, if there's some
way to check the length of a row, only counting single-spaced words?
2013-01-29 15:48:32 -06:00
31ff97db8a Almost have things working. It seems like some of the record results
are overlapping. I'm assuming this is due to missing a continue
or something inside the ColumnCollector. I added a couple new IsNextRecord
exceptions in response to blank rows, but this may be causing more problems
than expected. Next step is probably to check the records returned, and verify
that nothing is being duplicated. Some of the duplicates may be filtered out
by the RecordBuilder class, or during the fields filtering in the pyaccuwage-pdfparse
script (see: fields).
2012-11-20 16:05:36 -06:00
1c7533973a Parsing all the way through the pdf appears to work. Next we need
to track the beginning/ending points for each record and append
continuation records onto the previous. There's some issue in
the pyaccuwage-pdfparse script causing it to have problems reading
the last record field in a record group. Maybe the record extractor
needs to dump the last failed ColumnCollector rather than return it
if it's determined to hold junk data?

The record builder seems to handle everything just fine.

Added a function to the field name parsing to replace ampersands
with an "and" string so as not to cause problems with variable names.
2012-11-13 15:53:41 -06:00
40fcbdc8b8 getting closer, added a FIXME to one of the fields. Having issues with columns in description fields 2012-07-17 15:44:28 -05:00
b3aed20388 fixed rangetoken issue with single byte values 2012-07-10 15:41:47 -05:00
e8145c5616 adding new pdf extract capability 2012-07-10 15:24:13 -05:00
04b3c3f273 Added pyaccuwage-parse script.
We encountered a problem with the parser where a description contained
a range value and the parse thought it was the beginning of a new field
definition. We should be able to exclude the incorrect range values
by looking at our last good range, and if the range does not continue
the previous range, then it is probably incorrect and can be discarded.

These changes can probably be performed in the tokenize section of the
parser.
2012-06-02 15:16:13 -05:00
69da154e59 attempting to add a commandline script 2012-06-02 14:18:48 -05:00