User Tools

Site Tools


wiki:playground

PlayGround

Proposal for modification of bacula catalogue

What

To add two or three new fields to the 'file' table.

  • size INTEGER (64b)
  • mtime Timestamp
  • ctime Timestamp

They are not indexed.

Why

At present some important information about the file is compressed and stored as text in the 'lstat' field. This is not queryable. Therefore at present it is impossible to perform a query against the catalogue and find such information as.

  • The size of the files.
  • The datetime it was last modified.
  • The datetime it was created.

Why Not

Kern is keen to limit the impact on the size and speed of the catalogue, particularly in the context of very large databases where the number of rows in 'file' may be 600M for a nominal db size of 100Gb.

Generating a size field is difficult in some (unusual) systems. In which case can be set to the number of blocks or left as NULL or zero. These additional fields are an aid to querying, not a requirement for operation.

Questions

Is there any use-case of having the ctime field? If not, we can dispose of it.

Testing

Running test on a 100Gb data is tiresome, so I'm testing with a 60M row subset.

The methodology is as follows. The test is with postgresql 8.3.

Use the attached simple program to generate four test files in COPY STDIN format.

60M rows and 1M rows for existing and alternative schemas.

* Create a new DB * Load up 60M rows * Vacuum on general principles * Create indexes * Wait till IO stops * Time the execution of the insert 1M rows script.

The scripts attached. Create the 'file' table in the 'bacula2' tablespace. In my case this is a large SATA drive. No I won't attach the scripts because it won't let me upload them. Check the Dev mailing list for 7 Oct.

Results

Please add your results here.

machine HP ML150G3, 1G ram, 64M shared_buffers, 16 checkpoint_segments original 520 minutes modified 532 minutes (2.3%)

Size difference

My phppgadmin just said 11.1 Gb in both cases. The size of the 1M rows insert files is 63M and 103M. Unsurprising, as datetimes in ISO format are very verbose. Perhaps someone can calculate the extra length of the tuple.

Conclusion

Premature, but I suspect that when the number of tuples is 'large', the Db spends more and more time maintaining the indexes. The effort of parsing the ISO format dates becomes less significant as tuple count increases. Comment??

wiki/playground.txt · Last modified: 2008/10/07 03:48 by john_mib-infotech.co.nz