pyaccuwage

embee/pyaccuwage

Fork 0

Commit graph

9029659f98 update internal VERSION property master Mark Riedesel 2025-05-13 12:45:51 -05:00
1302de9df7 bump version to 0.2025.0 Mark Riedesel 2025-05-13 12:14:02 -05:00
fb8091fb09 change Iowa RS record state_employer_account_num from TextField to IntegerField Mark Riedesel 2025-05-13 12:09:49 -05:00
4408da71a9 mark some fields as optional Mark Riedesel 2024-04-10 09:41:10 -04:00
e0e4c1291d add min_length option to TextField for SSNs and stuff like that Mark Riedesel 2024-03-31 11:52:22 -04:00
5f4dc8b80f add 'blank' field option to allow empty text in required fields (default: false) Mark Riedesel 2024-03-31 11:14:16 -04:00
74b7935ced bump version to 2024 Mark Riedesel 2024-03-29 10:50:25 -04:00
66573e4d1d update for 2023 p1220 parsing, stupid irs Mark Riedesel 2024-03-29 10:46:01 -04:00
86f8861da1 encode record delimiter as ascii bytes when str is passed Mark Riedesel 2022-02-06 11:06:51 -06:00
042de7ecb0 import typing.Callable (python 3.10+) Mark Riedesel 2021-12-18 08:56:43 -05:00
f28cd6edf2 bump version 0.2020.0 Mark Riedesel 2021-09-03 07:48:24 -05:00
0bd82e09c4 Fix StaticField + tests for StaticField and unset optional TextField Mark Riedesel 2021-09-03 05:45:01 -05:00
558e3fd232 hopefully fix STaticField Mark Riedesel 2021-09-02 17:40:35 -05:00
7867a52a0c fliped args around like a simpleton Mark Riedesel 2021-01-29 16:26:26 -05:00
bfd43b7448 release 0.2018.2 Mark Riedesel 2020-06-12 14:45:08 -05:00
1f1d3dd9bb Merge branch 'conversion-support' Mark Riedesel 2020-06-12 13:13:28 -05:00
431b594c1e add pyaccuwage-convert conversion-support Mark Riedesel 2020-06-12 13:10:13 -05:00
8f86f76167 add format interchange functions, add tests, fix stuff Mark Riedesel 2020-06-12 13:07:41 -05:00
6af5067fca add option for record delimiter Mark Riedesel 2019-01-30 14:25:24 -06:00
250ca8d31f fix flubbed blank field specifier on StateTotalRecordIA Mark Riedesel 2019-01-28 13:29:14 -06:00
7ddcfcc1c3 clean up some indent Mark Riedesel 2019-01-27 10:36:37 -06:00
d08f1ca586 hopefully fix python 2 and 3 compatability compat2 Mark Riedesel 2019-01-27 09:30:22 -06:00
6381f8b1ec bump version to 2018.01 Mark Riedesel 2019-01-26 16:11:24 -06:00
7c32cb0dd3 add StateTotalRecord for Iowa Mark Riedesel 2019-01-26 11:49:31 -06:00
056b10d953 add 'permitted_benefits_health' to RT and RO records for 2017 compat Mark Riedesel 2018-01-27 11:38:23 -06:00
5afdcd6a50 add 'permitted_benefits_health' to RT and RO records for 2017 Mark Riedesel 2018-01-27 11:38:23 -06:00
706c39f7bb CRLFField return binary data for get_data() Mark Riedesel 2017-10-29 10:41:52 -05:00
078273f49f fix json encoding by encoding bytes as ascii Mark Riedesel 2017-01-07 17:00:29 -06:00
9320c68961 use BytesIO to work with python3 Mark Riedesel 2017-01-07 14:52:33 -06:00
16bf2c41d0 run through 2to3 Mark Riedesel 2017-01-07 13:58:33 -06:00
961aedc0ae Added very important data cleaning added Binh Nguyen 2014-02-01 15:10:40 -06:00
fc04a66869 Fixed debugging output Binh Nguyen 2014-02-01 12:52:44 -06:00
4eedab0e7c Added default record length Mark Riedesel 2013-10-11 00:19:20 -05:00
03ce460181 Completed JSON importer. Exported from import matches original data, must be working Binh Nguyen 2013-05-21 13:36:44 -05:00
7f9e5dbf65 added json encoder and partially functioning json decoder Binh Nguyen 2013-05-14 13:48:48 -05:00
b9982c3a21 added missing modeldef.py and fixed genfieldfill Binh Nguyen 2013-04-20 14:34:14 -05:00
9bbe100929 added pyaccuwage-genfieldfill Binh Nguyen 2013-04-20 14:31:09 -05:00
ef9f012bd2 added checkseq to scripts in setup.py Binh Nguyen 2013-04-20 13:03:09 -05:00
6bff5da58b pyaccuwage-checkseq now reports error lines when it encounters out-of-sequence field comments Binh Nguyen 2013-04-13 12:31:11 -05:00
c6df6c5452 Added pyaccuwage-checkseq. Everything works so far, currently the sequence comments are returned as string tuples. Next step is to take these results, convert them to integers, and make sure they occur in the expected linear order. Binh Nguyen 2013-03-30 13:15:23 -05:00
e8e57bb932 improved record detection, state records are now found Binh Nguyen 2013-03-26 13:23:48 -05:00
8cf78b5336 removed blank field counter, replaced with hash digest of rowspan Binh Nguyen 2013-03-20 15:49:16 -05:00
456c15eb1c Merge branch 'master' of brimstone.klowner.com:pyaccuwage Binh Nguyen 2013-03-20 15:19:31 -05:00
47f5021a84 changing repr Binh Nguyen 2013-03-20 15:18:12 -05:00
e0d54c8a01 merging Binh Nguyen 2013-03-20 15:15:51 -05:00
d058e64d26 tweaking validation Binh Nguyen 2013-03-20 15:13:44 -05:00
a1ab6b4918 Looks like 1220 form has changed since last year, work on getting changes applied in a simple manner. Binh Nguyen 2013-03-05 14:49:38 -06:00
afc4138898 fixed automatic model generation inheretence Binh Nguyen 2013-02-19 16:06:11 -06:00
b40e736ae0 bumping version, improving field type guessing Binh Nguyen 2013-02-19 15:55:05 -06:00
730073dcd1 working better! Binh Nguyen 2013-02-05 15:43:04 -06:00
e6e087ef38 Record merging seems to work now that header offsets have been corrected. There's an issue parsing p1220 on line 2570. Maybe making the parser ignore full-width lines during parsing would fix the problem, if there's some way to check the length of a row, only counting single-spaced words? Binh Nguyen 2013-01-29 15:48:32 -06:00
6e4a975cfb Changed the way records are found by searching for field headers and then working backwards to determine the record name. We also added the ability to "break" from reading a series of field definitions based on certain break points such as "Record Layout". There is currently an error in p1220 line 2704 which is caused by the column data starting on the 4th column "Description and Remarks". Binh Nguyen 2012-12-04 16:04:08 -06:00
8995f142e5 Merge branch 'master' of brimstone.klowner.com:pyaccuwage Binh Nguyen 2012-12-04 14:57:20 -06:00
6e1d02db8d trying new header location method Binh Nguyen 2012-12-04 14:54:10 -06:00
e9a6dc981f Refer to previous log, but also verify that records are returning proper information prior to getting passed into the ColumnCollector. It seems like some things are getting stripped out due to blank lines or perhaps the annoying "Record Layout" pages. If we could extract the "record layout" sections, things may be simpler" Binh Nguyen 2012-11-27 16:01:00 -06:00
31ff97db8a Almost have things working. It seems like some of the record results are overlapping. I'm assuming this is due to missing a continue or something inside the ColumnCollector. I added a couple new IsNextRecord exceptions in response to blank rows, but this may be causing more problems than expected. Next step is probably to check the records returned, and verify that nothing is being duplicated. Some of the duplicates may be filtered out by the RecordBuilder class, or during the fields filtering in the pyaccuwage-pdfparse script (see: fields). Binh Nguyen 2012-11-20 16:05:36 -06:00
1c7533973a Parsing all the way through the pdf appears to work. Next we need to track the beginning/ending points for each record and append continuation records onto the previous. There's some issue in the pyaccuwage-pdfparse script causing it to have problems reading the last record field in a record group. Maybe the record extractor needs to dump the last failed ColumnCollector rather than return it if it's determined to hold junk data? Binh Nguyen 2012-11-13 15:53:41 -06:00
fe4bd20bad Record detection seems to be working much better. We currently have an issue where full-page width blocks are being interpreted as a single large column, and then subsequent field definition columns are being truncated in as subcolumns. Binh Nguyen 2012-11-06 15:34:35 -06:00
46755dd90d updated VERSION Binh Nguyen 2012-10-16 13:22:44 -05:00
820f71b3f5 Merge branch 'master' of brimstone.klowner.com:pyaccuwage Binh Nguyen 2012-10-09 15:36:11 -05:00
6abfa5b345 fixed missing field, updated for 2012 Binh Nguyen 2012-10-09 15:31:35 -05:00
30376a54f3 fixed missing field, updated for 2012 Binh Nguyen 2012-10-09 15:31:35 -05:00
717f929015 updated records to match 2012 definitions Binh Nguyen 2012-09-25 15:45:00 -05:00
40fcbdc8b8 getting closer, added a FIXME to one of the fields. Having issues with columns in description fields Binh Nguyen 2012-07-17 15:44:28 -05:00
5dde3be536 forgot to convert tuple to list for the missing description field fix, derrrp Binh Nguyen 2012-07-17 14:16:28 -05:00
0dc55ab3dd fixed reading fields that don't have descriptions Binh Nguyen 2012-07-17 14:10:34 -05:00
b3aed20388 fixed rangetoken issue with single byte values Binh Nguyen 2012-07-10 15:41:47 -05:00
e8145c5616 adding new pdf extract capability Binh Nguyen 2012-07-10 15:24:13 -05:00
b77b80e485 We need to remove some of the yield statements because it's making iteration very confusing to keep track of, due to global iterators being passed around and iterated over in chunks. Binh Nguyen 2012-06-30 15:21:05 -05:00
6b5eb30f34 added ColumnCollector, fixed column parsing by scanning for whitespace before separating Binh Nguyen 2012-06-26 15:55:18 -05:00
fecd14db59 adding pdfextract for column extraction Binh Nguyen 2012-06-19 15:37:17 -05:00
770aeb0d2b Ranges in descriptions are ignored, except in cases where the range matches the next expected range. The only way to get around this seems to be to manually remove the range value from the input. Binh Nguyen 2012-06-06 14:46:17 -05:00
04b3c3f273 Added pyaccuwage-parse script. Binh Nguyen 2012-06-02 15:16:13 -05:00
69da154e59 attempting to add a commandline script Binh Nguyen 2012-06-02 14:18:48 -05:00
ad5262e37e added length checking to field matching criteria for parser Binh Nguyen 2012-05-08 14:08:39 -05:00
2c9551f677 Fixed issue with last item not being insert into tokens. Now able to convert PDF text into record field definitions pretty reliably. Need to add additional field type detection rules. Binh Nguyen 2012-04-18 14:51:59 -05:00
027b44b65c Parser is mostly working, there's an issue with the last grouping of tokens not being parsed. This can probably fixed by yielding an end-marker from the tokenizer generator so the compiler knows to clear out the last item. Binh Nguyen 2012-04-13 14:39:02 -05:00
6e9b8041b9 adding a simple parser for reading stuff from pdfs Binh Nguyen 2012-04-05 15:19:00 -05:00
97a74c09f9 fixed some field types, misc Binh Nguyen 2011-11-12 15:26:17 -06:00
7772ec679f Renamed "verify" functions to "validate". Binh Nguyen 2011-11-12 13:50:14 -06:00
ea492c2f56 renamed NumericField to IntegerField Binh Nguyen 2011-11-05 14:12:47 -05:00
a3f89e3790 fixed a couple field types being wrong, improved validation, auto-truncate over-length fields Binh Nguyen 2011-11-05 14:11:37 -05:00
076efd4036 0.0.6, fixed field types Binh Van Nguyen 2011-10-29 14:58:59 -05:00
7cb8bed61e Bumped version to 0.0.5 Fixed problem where fields contained shared values by performing a shallow copy on all fields during Record instantiation. That way, each record has its own copy of the field instances, rather than the shared class-wide instance provided by the definition. Binh Van Nguyen 2011-10-29 14:03:03 -05:00
4023d46b4a Changed a few fields to be optional. Binh Van Nguyen 2011-10-25 14:54:22 -05:00
775d3d3700 bump to v0.0.3 Binh Van Nguyen 2011-09-24 15:40:06 -05:00
3dfcf030e7 I can't type Binh Van Nguyen 2011-09-24 15:27:55 -05:00
c8965afab5 changing to version 0.0.2 Binh Van Nguyen 2011-09-24 13:32:31 -05:00
93d7465e1a promoting to v0.2 Binh Van Nguyen 2011-09-24 13:29:42 -05:00
1a0f4183e7 Everything works, or seems to. The package is now installable as a regular python module through pip or whatever. Now our apps can assemble data objects to be converted into accuwage files. Binh Van Nguyen 2011-09-17 11:22:04 -05:00
6f5d29faab moved everything into pyaccuwage subdir Binh Van Nguyen 2011-06-25 15:08:38 -05:00
5eb8925032 added __init__ to setup Binh Van Nguyen 2011-06-25 15:02:06 -05:00
78f8b845fe fixed set>setup Binh Van Nguyen 2011-06-25 14:59:18 -05:00
3d6a64db1d added test setup.py Binh Van Nguyen 2011-06-25 14:57:30 -05:00
ab16399e19 made enum names consistent Binh Van Nguyen 2011-06-25 14:33:28 -05:00
5f9211f30a fixed two silly syntax errors Binh Van Nguyen 2011-06-25 14:31:52 -05:00
0646bf7b9b Added record validation functions for everything (that we saw in the PDF), we should go over them once more to make sure we didn't miss anything, but testing validation should probably be done after that. Verify that the record ordering enforcement code is correct, then start thinking of how to get data from external sources into the record generator. Binh Van Nguyen 2011-06-11 14:45:12 -05:00
7dcbd6305b Added country code list to enums Binh Van Nguyen 2011-06-04 15:52:48 -05:00
a0014ca451 Added a MonthYear field, fixed some field required values and fixed validation functions. Added numeric state abbreviation capability. So far everything appears to be working good. Binh Van Nguyen 2011-06-04 15:46:41 -05:00
5781cbf335 Finished up most of the record order validation and also checking for all required records in a set. Added a controller class but decided to put stuff in __init__ instead, at least for now. Added a DateField which converts datetime.date into the proper string format for EFW2 files (hopefully), this should still be tested next week. Binh Van Nguyen 2011-05-07 15:19:48 -05:00