Recent

Author Topic: Taazz nice work on SDFDataset - and my nasty test code  (Read 26689 times)

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Taazz nice work on SDFDataset - and my nasty test code
« on: September 15, 2012, 02:51:53 pm »
Taazz & everybody,

I've been stripping out your comments and FPC 2.6 adaptations on your sdfdata.pp (in http://lazarus.freepascal.org/index.php/topic,18214.0.html) so that it can be prepared as a patch for current FPC trunk.

BTW: may I say I hate Delphi's SDF "format" with a passion? It's so close to CSV yet it's not CSV. Delphi's own documentation is in contradiction at least with what Turbo Delphi 2006 produces.
For my synopsis of what sdf should do, please see
http://wiki.lazarus.freepascal.org/CSV#SDF_format

Michael Van Canneyt graciously commited a test program of mine in FPC trunk: packages/fcl-db/tests/tcsdfdata.pp
Not everybody is working with trunk and the test program has some flaws too (e.g. field length).

I've been running a modified test program with taazz' code to get it to spit out 0 errors.

Unfortunately, it doesn't yet, completely.
TestInputOurFormat seems to show that taazz' adaptations work: sdfdataset now correctly reads multiline data!!  ;D ;D ;D

I'm having 2 problems:
1. The TestOutput test demonstrates sdfdataset thinks the last record is duplicate. Is it my test or is it the sdfdataset code? The same symptom occurs when reading sdfdataset from file, see the TestDelimitedTextOutput test
2. It still does not deal with quotes (or lack of it) well: this single field (yes, the data includes the quotes)
Quote
"Delimiter,""and"";quote"
gets read and split up into multiple fields. As far as I understand that shouldn't happen, right?

Attached taazz' sdfdata.pp adapted for FPC trunk (please rename sdfdata.pp.2.6 if running 2.6.0) and the test program files. Please put them in the same directory, compile them, and run with
Code: [Select]
testmultiline --all --format=plain

Comments, advice, patches gratefully received!
« Last Edit: September 15, 2012, 03:53:08 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #1 on: September 15, 2012, 03:52:52 pm »
Sorry, stupid error in testsdfmultiline. fixed version attached.

Time for a break, I think...
« Last Edit: September 15, 2012, 05:26:43 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #2 on: September 15, 2012, 04:29:16 pm »
And now the debugger on Laz trunk doesn't want to stop on breakpoints. Stabs, dwarf, auto doesn't matter.

Where did that last strand of my hair go?
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #3 on: September 15, 2012, 04:51:58 pm »
well I can give you some of mine their laying all around me if that would help.

I'm going for a re installation my self as well. that is why I hate building the IDE to install components
1 wrong move or switch in one of those packages and I'm left in the cold trying to find which of the thousands of switches does not work any more. Oh well an other hour down the drain I suppose.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #4 on: September 15, 2012, 07:12:37 pm »
And now the debugger on Laz trunk doesn't want to stop on breakpoints. Stabs, dwarf, auto doesn't matter.

Where did that last strand of my hair go?
Thanks to ludob for figuring it out:
passing --all in run parameters doesn't work; passing -a or "--all" does work

reported as
http://bugs.freepascal.org/view.php?id=22893
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #5 on: September 15, 2012, 08:05:08 pm »
... and something more: loading an empty file with FirstLineAsSchema results in a recordcount of 1. That's probably also the case for all other record counts...

http://bugs.freepascal.org/view.php?id=22894

I can't believe nobody wrote a test harness for this piece of #()%#()$*#$%#@ before.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #6 on: September 15, 2012, 08:22:24 pm »
just a heads up, I'm going to look in to this as well although it might need some heavy restructuring, the way it currently is structured gives me the impression that some one placed code where ever it was convenient not where it supposed to go. For example I am steel looking where is the field separation code and how it works. I found something that looks like it in the InternalInitFieldDefs which has nothing to do with fields position in the buffer but I didn't had time to verify it yet, it might be parsing of the schema line.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #7 on: September 15, 2012, 08:25:57 pm »
Ok. I'm going to lay off it for a while.

If you are going to edit & if you want to, no need to add the comments etc (except those helpful ones for FPC 2.6.0 compatibility that indicate what I need to strip for FPC trunk).
I can use winmerge or diff to figure out the differences to the current version.

Edit: if you would be so kind though, perhaps we could write up as many test cases as possible... would help keeping track of regressions etc).
(OT: And use my own local mercurial repository to keep track of what I do - others swear by subversion or git)

Anyway, thanks a lot for the great help up to now ;)

Thanks,
BigChimp
« Last Edit: September 15, 2012, 08:27:54 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #8 on: September 16, 2012, 04:02:21 am »
Latest test code:
- contains taazz code for fpc 2.6 as well as trunk
- incorporates expanded test data set (includes expanded tcsdfdata.pp).
- Keeps existing console test runner (testmultiline) Run with testmultiline -a --format=plain
- Added GUI test runner: testgui.lpi. Runs the same tests but easier to see results within Lazarus or outside
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #9 on: September 17, 2012, 07:16:36 am »
I agree that sdfdataset probably could do with a rewrite.
Things that worry me are e.g. the short fixed field length you seem to get when you have a header line in the file, and the fact that, as you said, taazz, quoting support etc had to be bolted on making it very unwieldy.
Meanwhile, extended test coverage is in the dbtest framework in fpc trunk. (make sure to copy database.ini.txt to database.ini and modify the chosen connector from bufdataset to sdfdataset). You can then run e.g. the gui test runner project in the db tests directory.

Perhaps we could use bufdataset to store the data which would only leave reading/parsing and saving the sdf format from/to file.
We would inherit bufdataset's bugs etc but as SQL database support is based on this, there is an incentive to fix these (and perhaps the bugs won't affect us)

I think I'm going to trial this with a new csvdataset based on csvdocument (see wiki) to parse csv/tab separated values/x-separated values.
If that works, I'll have a look at sdfdataset.
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #10 on: September 17, 2012, 03:14:32 pm »
Ok. Downloaded your files. I have a working development folder, thanks to fpcup, so, I'll start running some test cases to see what is going on and how can I improve it tonight. I'll post any findings here.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #11 on: September 19, 2012, 06:26:06 am »
find attached the new and improved TSDFDataset and its test cases.

The changes are not as extensive as I was afraid. So far your tests run correctly. I had to change one or two tests because they didn't quote properly other than that I haven' t touched anything else. The current changes are

1) RecordCount does not include the schema line any more.
2) FieldDefs can be used to define a schema instead of the schema list. In the case that both schema and fieldDefs are set then Schema takes presents.
3) some minor changes and a possible infinite loop problem that I need to test.
4) set the foundations for a more validation strict dataset. I need to map how to handle differences in the file schema and fieldDefs collection first.

PS. FPC 2.6.1 does not need the sdfdata26.pp so I haven't touched that one.
« Last Edit: September 19, 2012, 06:28:00 am by taazz »
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #12 on: September 19, 2012, 01:10:53 pm »
Thanks a lot, taazz!

I don't really see why you change the delim.csv test to include quoting  given this remark in the code:
Quote
// See test results from bug 19610 for evidence that the strings below should work.
In other words: Delphi emits these kinds of strings as valid sdfdata. It should make sense sdfdataset can then read them? I'll double check the output; using a stringlist and relying on FPC delimited text was not a smart thing to do as bugs in FPC delimitedtext will lead to test failure here.
Better to just write the raw string to file... I'll adapt the test
Edit: the test does need to be adapted... please wait ;)

Could you/anybody confirm that Delphi generates
Quote
"Delimiter,""and"";quote","J""T""",Just a long line,"Just a quoted long line","multi
line","Delimiter,and;done","Some ""random"" ""quotes"
if so... I'll add it to the test set as another test.

Thanks again,
BigChimp
« Last Edit: September 19, 2012, 01:48:45 pm by BigChimp »
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

taazz

  • Hero Member
  • *****
  • Posts: 5368
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #13 on: September 19, 2012, 02:13:30 pm »
Thanks a lot, taazz!

I don't really see why you change the delim.csv test to include quoting  given this remark in the code:
Quote
// See test results from bug 19610 for evidence that the strings below should work.

Two reasons
1) I was focused on the CSV RFC (I think is included in the zip I attached) which clearly states that a quote char inside a field is illegal unless the field is quoted.
2) I posted the changes which I think make the component easier to use and tries to merge FieldDefs and schema.

There are a number of issues I'm thinking to address on top of the existing base when I get more time, for example
  • Use FieldDefs as a validation mechanism to ensure that the file opened is in the format the user expects ee raise an error if the field number is not enough for all the fielddefs and ignore the extra fields while using the dataset but not when saving the data back to disk.
  • Support field types other than string and the existing validation/editing mechanism that the DB framework has build in.
  • add Support for more specialized types with constrains eg blobs/images which will be saved encoded to either uuencode or enc64.The constrain here that the size will be the maximum length of bytes allowed in each blob and will never save blob larger than the size specified
  • change the way that spaces are handled to avoid trimming spaces that are part of the fields value either typed by the user or read from the file.
  • add some kind of auto-type recognition to allow to convert data to specific types eg date, integer,extended etc.
  • add support for locale specific recognition eg don't allow the use of the comma ',' as a delimiter if it is used as floating point delimiter.

That's on the top of my head I'm have missed something.

In any case is it delphi compatibility a requirement? Should we only support reading those inappropriately format fields? If yes I will revert the changes and try to implement it.
Good judgement is the result of experience … Experience is the result of bad judgement.

OS : Windows 7 64 bit
Laz: Lazarus 1.4.4 FPC 2.6.4 i386-win32-win32/win64

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Taazz nice work on SDFDataset - and my nasty test code
« Reply #14 on: September 19, 2012, 03:02:54 pm »
taazz: yes, if you want to write a csv dataset, be my guest, but sdf is different than csv (sigh).

In my opinion, the sdf dataformat is just plain )$*(%#()*$% ugly. However, interoperability with Delphi is therefore (I think) about the only reason anybody would want this abomination.
I would definitely try to support everything a normal Delphi app would spit out - suggested: with strictdelimiter:=false, as that is the default.
Unfortunately, because of lack of a better alternative, loads of people (including me in the beginning) insisted on using sdfdataset to load their csv files, which often works but breaks horribly on boundary conditions where
- sdf specs and/or
- the Delphi implementation which deviates a bit from the spec and/or
- the FPC implementation, which differs rather more from the spec, see bug 19610
differ from what any sane csv format would be.

Looking at the Delphi output for bug report 19610:
Code: [Select]
normal_string;quoted_string;"quoted;delimiter";quoted and space;"""quoted_and_starting_quote";"""quoted, starting quote, and space";quoted_with_tab character;quoted_multi
line;  UnquotedSpacesInfront;UnquotedSpacesAtTheEnd   ;"  ""Spaces before quoted string""";Spaces after quoted string;   ;
gives:
(The numbers below indicate field number)
Code: [Select]
Resulting elements with strictdelimiter false:
0normal_string
1quoted_string
2quoted;delimiter
3quoted and space
4"quoted_and_starting_quote
5"quoted, starting quote, and space
6quoted_with_tab character
7quoted_multi
line
8UnquotedSpacesInfront
9UnquotedSpacesAtTheEnd
10Spaces before quoted string
11Spaces after quoted string
12

Well, perhaps supporting the spaces after quoted string thing is too much.
I'm almost done cleaning up the test cases to closely match the Delphi test program in 19610.
I'll separate out the Spaces after quoted string case, and remove some of your added quote tests.

Understand and agree with your further changes, but I think those may actually be better done in a CSV dataset.

I would really like to see an RFC 4180 compliant CSV dataset and I would *strongly* suggest you take a look at combining csvdocument (see the wiki), as it's csvparser beautifully supports all the intricacies of RFC4180, as well as Excel mode etc.
This means we don't need to implement a parser of our own.

The rest of the dataset support could be built on this, e.g. by using memds or bufdataset or possibly ripping out the sdfdataset code (which I have my doubts about but by now you know much more about it than I).

Writing out the csv to file should once again be easy as csvdocument has a class for that as well.

I'll polish up the test cases for sdfdataset and post them...

Awaiting with interest to hear your opinion!
Thanks,
BigChimp
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

 

TinyPortal © 2005-2018