Commit graph

94 commits

Author SHA1 Message Date
bfd43b7448 release 0.2018.2 2020-06-12 14:45:08 -05:00
1f1d3dd9bb Merge branch 'conversion-support' 2020-06-12 13:13:28 -05:00
431b594c1e add pyaccuwage-convert 2020-06-12 13:10:13 -05:00
8f86f76167 add format interchange functions, add tests, fix stuff 2020-06-12 13:07:41 -05:00
6af5067fca add option for record delimiter 2019-01-30 14:25:24 -06:00
250ca8d31f fix flubbed blank field specifier on StateTotalRecordIA 2019-01-28 13:29:14 -06:00
7ddcfcc1c3 clean up some indent 2019-01-27 10:36:37 -06:00
d08f1ca586 hopefully fix python 2 and 3 compatability 2019-01-27 09:30:22 -06:00
6381f8b1ec bump version to 2018.01 2019-01-26 16:11:24 -06:00
7c32cb0dd3 add StateTotalRecord for Iowa 2019-01-26 11:49:31 -06:00
5afdcd6a50 add 'permitted_benefits_health' to RT and RO records for 2017 2018-01-27 11:38:23 -06:00
706c39f7bb CRLFField return binary data for get_data() 2017-10-29 10:41:52 -05:00
078273f49f fix json encoding by encoding bytes as ascii 2017-01-07 17:00:29 -06:00
9320c68961 use BytesIO to work with python3 2017-01-07 14:52:33 -06:00
16bf2c41d0 run through 2to3 2017-01-07 13:58:33 -06:00
961aedc0ae Added very important data cleaning added
TextField now cleans CR and LF from data, this is very important for not
breaking everything and leaving me completely confused.

Thank you, Lauren!
2014-02-01 15:10:40 -06:00
fc04a66869 Fixed debugging output 2014-02-01 12:57:36 -06:00
4eedab0e7c Added default record length 2013-10-11 00:21:28 -05:00
03ce460181 Completed JSON importer. Exported from import matches original data, must be working 2013-05-21 13:36:44 -05:00
7f9e5dbf65 added json encoder and partially functioning json decoder 2013-05-14 13:48:48 -05:00
b9982c3a21 added missing modeldef.py and fixed genfieldfill 2013-04-20 14:34:14 -05:00
9bbe100929 added pyaccuwage-genfieldfill 2013-04-20 14:31:09 -05:00
ef9f012bd2 added checkseq to scripts in setup.py 2013-04-20 13:03:09 -05:00
6bff5da58b pyaccuwage-checkseq now reports error lines when it encounters out-of-sequence field comments 2013-04-13 12:31:11 -05:00
c6df6c5452 Added pyaccuwage-checkseq. Everything works so far, currently
the sequence comments are returned as string tuples. Next step
is to take these results, convert them to integers, and make sure
they occur in the expected linear order.
2013-03-30 13:15:23 -05:00
e8e57bb932 improved record detection, state records are now found 2013-03-26 13:23:48 -05:00
8cf78b5336 removed blank field counter, replaced with hash digest of rowspan 2013-03-20 15:49:16 -05:00
456c15eb1c Merge branch 'master' of brimstone.klowner.com:pyaccuwage
Conflicts:
	pyaccuwage/pdfextract.py
2013-03-20 15:19:31 -05:00
47f5021a84 changing repr 2013-03-20 15:18:12 -05:00
e0d54c8a01 merging 2013-03-20 15:15:51 -05:00
d058e64d26 tweaking validation 2013-03-20 15:13:44 -05:00
a1ab6b4918 Looks like 1220 form has changed since last year, work on getting
changes applied in a simple manner.
2013-03-05 14:49:38 -06:00
afc4138898 fixed automatic model generation inheretence 2013-02-19 16:06:11 -06:00
b40e736ae0 bumping version, improving field type guessing 2013-02-19 15:55:05 -06:00
730073dcd1 working better! 2013-02-05 15:43:04 -06:00
e6e087ef38 Record merging seems to work now that header offsets have been corrected.
There's an issue parsing p1220 on line 2570. Maybe making the parser ignore
full-width lines during parsing would fix the problem, if there's some
way to check the length of a row, only counting single-spaced words?
2013-01-29 15:48:32 -06:00
6e4a975cfb Changed the way records are found by searching for field headers and then working
backwards to determine the record name. We also added the ability to "break" from
reading a series of field definitions based on certain break points such as
"Record Layout". There is currently an error in p1220 line 2704 which is caused
by the column data starting on the 4th column "Description and Remarks".

If ColumnCollectors started with the field titles, and had awareness of the column
positions starting with those, it may be possible to at least read the following
record fields without auto-adjusting them.
2012-12-04 16:04:08 -06:00
8995f142e5 Merge branch 'master' of brimstone.klowner.com:pyaccuwage
Conflicts:
	pyaccuwage/pdfextract.py
2012-12-04 14:57:20 -06:00
6e1d02db8d trying new header location method 2012-12-04 14:54:10 -06:00
e9a6dc981f Refer to previous log, but also verify that records are returning
proper information prior to getting passed into the ColumnCollector.
It seems like some things are getting stripped out due to blank lines
or perhaps the annoying "Record Layout" pages. If we could extract the
"record layout" sections, things may be simpler"
2012-11-27 16:01:00 -06:00
31ff97db8a Almost have things working. It seems like some of the record results
are overlapping. I'm assuming this is due to missing a continue
or something inside the ColumnCollector. I added a couple new IsNextRecord
exceptions in response to blank rows, but this may be causing more problems
than expected. Next step is probably to check the records returned, and verify
that nothing is being duplicated. Some of the duplicates may be filtered out
by the RecordBuilder class, or during the fields filtering in the pyaccuwage-pdfparse
script (see: fields).
2012-11-20 16:05:36 -06:00
1c7533973a Parsing all the way through the pdf appears to work. Next we need
to track the beginning/ending points for each record and append
continuation records onto the previous. There's some issue in
the pyaccuwage-pdfparse script causing it to have problems reading
the last record field in a record group. Maybe the record extractor
needs to dump the last failed ColumnCollector rather than return it
if it's determined to hold junk data?

The record builder seems to handle everything just fine.

Added a function to the field name parsing to replace ampersands
with an "and" string so as not to cause problems with variable names.
2012-11-13 15:53:41 -06:00
fe4bd20bad Record detection seems to be working much better. We currently have
an issue where full-page width blocks are being interpreted as a
single large column, and then subsequent field definition columns
are being truncated in as subcolumns.

The current problematic line in p1220 is 1598.

Maybe add some functionality which lets us specify the number of
columns we're most interested in? Automatically discard 1-column
ColumnCollectors maybe?
2012-11-06 15:34:35 -06:00
46755dd90d updated VERSION 2012-10-16 13:22:44 -05:00
820f71b3f5 Merge branch 'master' of brimstone.klowner.com:pyaccuwage 2012-10-09 15:36:11 -05:00
6abfa5b345 fixed missing field, updated for 2012 2012-10-09 15:35:13 -05:00
30376a54f3 fixed missing field, updated for 2012 2012-10-09 15:31:35 -05:00
717f929015 updated records to match 2012 definitions 2012-09-25 15:45:00 -05:00
40fcbdc8b8 getting closer, added a FIXME to one of the fields. Having issues with columns in description fields 2012-07-17 15:44:28 -05:00
5dde3be536 forgot to convert tuple to list for the missing description field fix, derrrp 2012-07-17 14:16:28 -05:00