pyaccuwage

Author	SHA1	Message	Date
Mark Riedesel	66573e4d1d	update for 2023 p1220 parsing, stupid irs	2024-03-29 10:48:04 -04:00
Mark Riedesel	bfd43b7448	release 0.2018.2	2020-06-12 14:45:08 -05:00
Mark Riedesel	431b594c1e	add pyaccuwage-convert	2020-06-12 13:10:13 -05:00
Binh Nguyen	b9982c3a21	added missing modeldef.py and fixed genfieldfill	2013-04-20 14:34:14 -05:00
Binh Nguyen	9bbe100929	added pyaccuwage-genfieldfill	2013-04-20 14:31:09 -05:00
Binh Nguyen	ef9f012bd2	added checkseq to scripts in setup.py	2013-04-20 13:03:09 -05:00
Binh Nguyen	6bff5da58b	pyaccuwage-checkseq now reports error lines when it encounters out-of-sequence field comments	2013-04-13 12:31:11 -05:00
Binh Nguyen	c6df6c5452	Added pyaccuwage-checkseq. Everything works so far, currently the sequence comments are returned as string tuples. Next step is to take these results, convert them to integers, and make sure they occur in the expected linear order.	2013-03-30 13:15:23 -05:00
Binh Nguyen	afc4138898	fixed automatic model generation inheretence	2013-02-19 16:06:11 -06:00
Binh Nguyen	b40e736ae0	bumping version, improving field type guessing	2013-02-19 15:55:05 -06:00
Binh Nguyen	e6e087ef38	Record merging seems to work now that header offsets have been corrected. There's an issue parsing p1220 on line 2570. Maybe making the parser ignore full-width lines during parsing would fix the problem, if there's some way to check the length of a row, only counting single-spaced words?	2013-01-29 15:48:32 -06:00
Binh Nguyen	31ff97db8a	Almost have things working. It seems like some of the record results are overlapping. I'm assuming this is due to missing a continue or something inside the ColumnCollector. I added a couple new IsNextRecord exceptions in response to blank rows, but this may be causing more problems than expected. Next step is probably to check the records returned, and verify that nothing is being duplicated. Some of the duplicates may be filtered out by the RecordBuilder class, or during the fields filtering in the pyaccuwage-pdfparse script (see: fields).	2012-11-20 16:05:36 -06:00
Binh Nguyen	1c7533973a	Parsing all the way through the pdf appears to work. Next we need to track the beginning/ending points for each record and append continuation records onto the previous. There's some issue in the pyaccuwage-pdfparse script causing it to have problems reading the last record field in a record group. Maybe the record extractor needs to dump the last failed ColumnCollector rather than return it if it's determined to hold junk data? The record builder seems to handle everything just fine. Added a function to the field name parsing to replace ampersands with an "and" string so as not to cause problems with variable names.	2012-11-13 15:53:41 -06:00
Binh Nguyen	40fcbdc8b8	getting closer, added a FIXME to one of the fields. Having issues with columns in description fields	2012-07-17 15:44:28 -05:00
Binh Nguyen	b3aed20388	fixed rangetoken issue with single byte values	2012-07-10 15:41:47 -05:00
Binh Nguyen	e8145c5616	adding new pdf extract capability	2012-07-10 15:24:13 -05:00
Binh Nguyen	04b3c3f273	Added pyaccuwage-parse script. We encountered a problem with the parser where a description contained a range value and the parse thought it was the beginning of a new field definition. We should be able to exclude the incorrect range values by looking at our last good range, and if the range does not continue the previous range, then it is probably incorrect and can be discarded. These changes can probably be performed in the tokenize section of the parser.	2012-06-02 15:16:13 -05:00
Binh Nguyen	69da154e59	attempting to add a commandline script	2012-06-02 14:18:48 -05:00

18 commits