Recent

Author Topic: Why is my packed record structure and CompareMem function failing?  (Read 14070 times)

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Why is my packed record structure and CompareMem function failing?
« Reply #15 on: April 09, 2012, 08:16:08 pm »
Quote
Like i said a bit earlier, it is possible that file data has the word FILE0 inside it.

My routine is to replicate what Grep does, nothing more, nothing less.

To add:
From what I can gather ted is parsing some MFT records, he's then just searching for the magic FILE0 tag inside the NTFS MFT file he has, I assume details on Microsoft MFT file structure is not easy to follow or find, and is the reason he's searching for the Magic pattern.  Also isn't MFT just pointers to the data files, so there's no data there anyway, apart from maybe the filename.  IOW: The data will be spread all other the place on the HD.
« Last Edit: April 09, 2012, 08:45:49 pm by KpjComp »

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Why is my packed record structure and CompareMem function failing?
« Reply #16 on: April 09, 2012, 11:16:47 pm »
Ok, I've updated the TBlockSearch class.

One problem I mentioned was with large search terms, the reason for this was due to the way I was filling the Fifo buffer up, this was done this way so that comparemem would work.

Now with some small modifications large search terms are not a problem, and it's also twice as fast as my previous one.  eg. 20Gig search takes about 5 mins.

Basically instead of shifting the fifo buffer using the move instructions, I've just created a simple ring buffer instead, comparemem of course won't understand the ringbuffer so I also just do the comparison manually inside the checkPos sub procedure.

Also there was a slight bug in the fact that integer or even largeint is not big enough for 20Gig files, so I've updated to use int64.

I'm sure there are ways to make this faster, eg. even Lazarus has the Search in files, that I assume will be doing even cleverer things for doing it's grep.

Code: [Select]
unit block_search;

{$mode objfpc}{$H+}

interface

uses
  Classes, SysUtils, fgl;

type

  { TBlockSearch }
  TBlockSearchResults = specialize TFPGList<int64>;

  TBlockSearch = class
  private
    src:TStream;
    fresults:TBlockSearchResults;
    block:array of byte;
  public
    procedure SearchFor(a:array of byte);
    constructor Create(_Src:TStream; blocksize:integer = 1024*1024);
    destructor Destroy; override;
    property Results:TBlockSearchResults read fResults;
  end;

implementation

{ TBlockSearch }

procedure TBlockSearch.SearchFor(a: array of byte);
var
  readsize:integer;
  fPos:Int64;
  fifoBuff:array of byte;
  fifoSt,fifoEn,searchLen,lpbyte:integer;

  //
  procedure CheckPos;
  var
    l,p:integer;
  begin
    p := fifoST;
    for l := 0 to pred(SearchLen) do
    begin
      if a[l] <> fifoBuff[p] then exit;
      //p := (p+1) mod SearchLen,   the if seems quicker
      inc(p); if p >= SearchLen then p := 0;
    end;
    fresults.Add(fpos-searchLen);
  end;
  //
begin
  fresults.clear;
  src.Position:=0;
  readsize := src.Read(block[0],Length(block));
  searchLen := length(a);
  if searchLen > length(block) then
    raise Exception.Create('Search term larger than blocksize');
  if readsize < searchLen then exit;
  setlength(fifoBuff,searchLen);
  move(block[0],fifoBuff[0],searchLen);
  fPos:=0;
  fifoSt:=0;
  fifoEn:=SearchLen-1;
  CheckPos;
  while readsize > 0 do
  begin
    for lpByte := 0 to pred(readsize) do
    begin
      inc(fifoSt); if fifoSt>=SearchLen then fifoST := 0;
      inc(fifoEn); if fifoEn>=SearchLen then fifoEn := 0;
      fifoBuff[fifoEn] := block[lpByte];
      inc(fPos);
      CheckPos;
    end;
    readsize := src.Read(block[0],Length(block));
  end;
end;

constructor TBlockSearch.Create(_Src: TStream; blocksize: integer);
begin
  inherited Create;
  setlength(block,blocksize);
  src := _src;
  fresults := TBlockSearchResults.Create;
end;

destructor TBlockSearch.Destroy;
begin
  freeAndNil(fresults);
  inherited Destroy;
end;

end.

Gizmo

  • Hero Member
  • *****
  • Posts: 831
Re: Why is my packed record structure and CompareMem function failing?
« Reply #17 on: April 10, 2012, 10:05:56 pm »
Dudes

What I have found in my time programming (which is not very long) is that although my coding efforts are not advanced or clever or even sensible, I usually understand them. I find following other examples, especially long ones, tricky to follow. I am though, of course, very very greatful for everyones help, especially KpjComp who has clearly spent a significant amount of time helping.

Consequently, I went away and just tried to work this out and came up with the following, which works (i.e. it finds the same number of entries as grep and it records the offset and I can fill my packed arrays with other relevant data and whenever it finds an entry it then goes back to where it found it (during the same loop) and reads approx 1024 bytes of data):

Code: [Select]
type
  MFTRecords = packed record
    // MFT Records have 41 bytes of structured data. The first 5 bytes are usually FILE0
    // We have a seperate array stored as record for storing that 'FILE0':
    FILE0MagicMarker : array[0..4] of byte;
  end;

  // Now we have another record for storing the rest of the 42 byte header, attribues, data
  // that is populated by data found immediately after FILE0 entries
    MFTRecordStructure = packed record
    FixupSequenceOffset:     array[0..1] of byte;
    UpdateSequenceSize:      array[0..1] of byte;   
    ....and so on
    end;             
var
  MFTHeaderArray : MFTRecords;
  MFTRecordToParse : MFTRecordStructure;   

const
  MFTMagicMarker = 'FILE0';
  HeaderLen = Length(MFTMagicMarker);
  ArraySize = SizeOf(MFTHeaderArray.FILE0MagicMarker);       
...
begin
...
while SourceFile.Read(MFTHeaderArray, ArraySize) = ArraySize do
    begin
      // If the following returns true, the buffer starts with 'FILE0' and we
      // probably have an MFT entry
      if CompareMem(@MFTMagicMarker[1], @MFTHeaderArray.FILE0MagicMarker[0], HeaderLen) then       
          // The SourceFile Position will always be at the end of the last buffer read
          // So to get the offset of where 'FILE0' was found, we have to subtract the
          // buffer size from the SourceFile position before outputting - typically 1024 bytes
          StartOfBufferPosition := SourceFile.Position - ArraySize;

          // One new record found, so tally it up.
          MFTEntryCounter := MFTEntryCounter + 1;       
...
          // Now read in 1019 bytes from BEYOND where the FILE0 entry was found
          // and add that to the other records buffer for later analysis
          SourceFile.Position := StartOfBufferPosition + 5;
          SourceFile.ReadBuffer(MFTRecordToParse, SizeOf(MFTRecordToParse));
         
           // And now I work on the data in the second packed record of arrays and output to memo or whatever
     else  // CompareMem returned false
      // we didn't find 'FILE0' this time round the loop. So position our file
      // cursor 1 byte on from last and loop again until the EOF is reached.
      sourceFile.Position := sourceFile.Position - ArraySize + 1;       

It's not very quick and not especially efficient, but if finds all the records and when it finds them, it reads a series of bytes after each entry that I can work with, and am working with.

As to why I'm doing this - I can't really say publically. However, MFT entries store the files themselves witnion the MFT if their size is < 1024 bytes, as well as created dates etc. It's called resident data. If they're larger than that they're stored out in NTFS clusters but data about them is still in the MFT.

Ted 
« Last Edit: April 10, 2012, 10:08:21 pm by tedsmith »

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Why is my packed record structure and CompareMem function failing?
« Reply #18 on: April 10, 2012, 11:36:51 pm »
Hi Ted,

No problem..
If you want to speed things up, it's not much of a modification using the BlockSearch class I've created.  There's no need to understand how the BlockSearch works, but that it returns you a list of File Positions that you can then use.  Basically take out your current while loop and place the following in it's place.

Code: [Select]
var
  bs:TBlockSearch;
..
..
  bs := TBlockSearch.Create(SourceFile);
  try
    bs.SearchFor(MFTMagicMarker);
    for i := 0 to bs.results.count-1 do
    begin
      SourceFile.Position := bs.Results.Items[i] + 5;
      SourceFile.ReadBuffer(MFTRecordToParse, SizeOf(MFTRecordToParse));
      // And now I work on the data in the second packed record of arrays and output to memo or whatever
    end;
  finally
    bs.free;
  end;
 

BigChimp

  • Hero Member
  • *****
  • Posts: 5740
  • Add to the wiki - it's free ;)
    • FPCUp, PaperTiger scanning and other open source projects
Re: Why is my packed record structure and CompareMem function failing?
« Reply #19 on: April 11, 2012, 10:39:39 am »
Kpjcomp,

Thanks for your code.

I added it to the wiki:
http://wiki.lazarus.freepascal.org/Rosetta_Stone

Hope you don't mind.

Thanks,
BigChimp
Want quicker answers to your questions? Read http://wiki.lazarus.freepascal.org/Lazarus_Faq#What_is_the_correct_way_to_ask_questions_in_the_forum.3F

Open source including papertiger OCR/PDF scanning:
https://bitbucket.org/reiniero

Lazarus trunk+FPC trunk x86, Windows x64 unless otherwise specified

KpjComp

  • Hero Member
  • *****
  • Posts: 680
Re: Why is my packed record structure and CompareMem function failing?
« Reply #20 on: April 11, 2012, 02:19:14 pm »
Hope you don't mind.

No problem, good idea!!.  Hope others find it useful.  :)

 

TinyPortal © 2005-2018