The following information provides a general rule for estimating the size of a database:
After reading these sections, you should read "Designing and Choosing Table Spaces".
Information is not provided for the space required by such things as:
Information such as row size and structure is precise. However, multiplication factors for file overhead because of disk fragmentation, free space, and variable length columns will vary in your own database since there is such a wide range of possibilities for the column types and lengths of rows in a database. After initially estimating your database size, create a test database and populate it with representative data. You will then find a multiplication factor that is more accurate for your own particular database design.
When a database is initially created, system catalog tables are created. The system tables will grow as database objects and privileges are added to the database. Initially, they use approximately 2.5 MB of disk space.
The amount of space allocated for the catalog tables depends on the type of table space and the extent size for the table space containing the catalog tables. For example, if a DMS table space with an extent size of 32 is used, the catalog table space will initially be allocated 20MB of space. For more information, see "Designing and Choosing Table Spaces".
Note: | For databases with multiple partitions, the catalog tables only reside on the partition where the CREATE DATABASE was issued. Disk space for the catalog tables is only required for that partition. |
By default, table data is stored on 4KB pages. Each 4 KB page contains 76 bytes of overhead for the database manager. This leaves 4020 bytes to hold user data (or rows), although no row can exceed 4005 bytes in length. A row will not span multiple pages. You can have a maximum of 500 columns when using a 4 KB page size.
Note that the table data pages do not contain the data for columns defined with LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB data types. The rows in a table data page do, however, contain a descriptor of these columns. (See "Long Field Data" and "Large Object (LOB) Data" for information about estimating the space required for the table objects that will contain the data stored using these data types.)
Typically, rows are inserted into the table in an approximate first-fit order. The file is searched (using a free space map) for the first available space that is large enough to hold the new row. When a row is updated, it is updated in place unless there is insufficient room left on the page to contain it. If this is the case, a record is created in the original row location which points to the new location in the table file of the updated row.
If the ALTER TABLE APPEND ON statement is used, then data will always be appended and information about any free space on the data pages will not be kept.
See "Long Field Data" and "Large Object (LOB) Data" for information about how LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB and DBCLOB data is stored and for estimating the space required to store these types of columns.
For each user table in the database, the number of 4 KB pages can be estimated by calculating:
ROUND DOWN(4020/(average row size + 10)) = records_per_page
Then use records_per_page with:
(number_of_records/records_per_page) * 1.1 = number_of_pages
Note: | This formula is only an estimate and is not guaranteed to be accurate. Accuracy of the estimate lessens if the length of records varies due to fragmentation and overflow records. |
The average row size is the sum of the average column sizes. For information on the size of each column, see CREATE TABLE in the SQL Reference.
The factor of "1.1" is for overhead.
You also have the option to create buffer pools or table spaces that have an 8 KB page size. All tables created within this type of table space will have 8 KB page sizes. A single table or index object can then be as large as 128 GB in size (the maximum is 64 GB for a single table or index object with a 4 KB page size). You can have a maximum of 1012 columns when using an 8 KB page size. Rows can be up to 8101 bytes in length.
Having a larger page size allows for the possible reduction in the number of levels in any index. If you are working with OLTP applications which do random row reads and writes, a smaller page size is better because it wastes less buffer space with undesired rows. If you are working with DSS applications which access large numbers of consequtive rows at a time, a larger page size is better because it reduces the number of Input/Output requests required to read a specific number of rows. An exception in this latter cases occurs when the row size is smaller than the page size divided by 256. In such a case, there is wasted space on each page. (Recall that there can only be a maximum of 256 rows per page.) To reduce this wasted space, a smaller page size may be more appropriate.
There are some restrictions when using an 8 KB page size. Using the extended storage cache is not allowed. LOB and LONG data cannot reside on pages 8 KB in size. When conducting backup and restore operations, a table space restore cannot be done from a 4 KB page size to an 8 KB page size; nor the reverse. In addition, you cannot import IXF data files that represent more than 755 columns.
If a table has LONG VARCHAR or LONG VARGRAPHIC data, in addition to the byte count of 20 for the LONG VARCHAR or LONG VARGRAPHIC descriptor (in the table row), the data itself must be stored. Long field data is stored in a separate table object which is structured differently from the other data types (see "User Table Data" and "Large Object (LOB) Data").
Data is stored in 32KB areas that are broken up into segments whose sizes are "powers of two" times 512 bytes. (Hence these segments can be 512 bytes, 1024 bytes, 2048 bytes, and so on, up to 32,700 bytes.)
Each of these data types is stored in a fashion that enables free space to be reclaimed easily. Allocation and free space information is stored in 4KB allocation pages, which appear infrequently throughout the object.
The amount of unused space in the object depends on the size of the long field data and whether this size is relatively constant across all occurrences of the data. For data entries larger than 255 bytes, this unused space can be up to 50 percent of the size of the long field data.
If character data is less than 4KB in length, and it fits in the record with the rest of the data, the CHAR, GRAPHIC, VARCHAR, or VARGRAPHIC data types should be used instead of LONG VARCHAR or LONG VARGRAPHIC.
If a table has BLOB, CLOB, or DBCLOB data, in addition to the byte count (between 72 and 312 bytes) for the BLOB, CLOB, or DBCLOB descriptor (in the table row), the data itself must be stored. This data is stored in two separate table objects that are structured differently than other data types (see "User Table Data" and "Long Field Data").
To estimate the space required by large object data, you need to consider the two table objects used to store data defined with these data types:
Data is stored in 64MB areas that are broken up into segments whose sizes are "powers of two" times 1024 bytes. (Hence these segments can be 1024 bytes, 2048 bytes, 4096 bytes, and so on, up to 64MB.)
To reduce the amount of disk space used by the LOB data, you can use the COMPACT parameter on the lob-options-clause on the CREATE TABLE and ALTER TABLE statements. The COMPACT option minimizes the amount of disk space required by allowing the LOB data to be split into smaller segments so that it will use the smallest amount of space possible. This does not involve data compression but is simply using the minimum amount of space to the nearest 1KB boundary. Without the COMPACT option, there is no attempt to reduce the space used to the nearest 1KB boundary. Appending to LOB values stored using the COMPACT option may result in slower performance compared to appending LOB values for which the COMPACT option is not specified.
The amount of free space contained in LOB data objects will be influenced by the amount of update and delete activity, as well as the size of the LOB values being inserted.
Allocation and free space information is stored in 4KB allocation pages separated from the actual data. The number of these 4KB pages is dependent on the amount of data, including unused space, allocated for the large object data. The overhead is calculated as follows: one 4KB pages for every 64GB plus one 4KB page for every 8MB.
If character data is less than 4KB in length, and it fits in the record with the rest of the data, the CHAR, GRAPHIC, VARCHAR, or VARGRAPHIC data types should be used instead of BLOB, CLOB or DBCLOB.
For each index, the space needed can be estimated as:
(average index key size + 8) * number of rows * 2
where:
Note: | For every column that allows nulls, add one extra byte for the null indicator. |
Temporary space is required when creating the index. The maximum amount of temporary space required during index creation can be estimated as:
(average index key size + 8) * number of rows * 3.2
Where the factor of 3.2 is for index overhead as well as space required for the sorting needed to create the index.
Note: | In the case of non-unique indexes, only four (4) bytes are required to store duplicate key entries. The estimates shown above assume no duplicates. The space required to store an index may be over-estimated by the formula shown above. |
The following two calculations can be used to estimate the number of leaf pages. The results are not guaranteed. The results are only an estimate, and the accuracy depends largely on how well the averages used reflect the actual data.
Note: | For SMS, the minimum space is 12KB. For DMS, the minimum is an extent. |
Following are two methods that you can use when calculating index space. The first method is a rough estimate, while the second method provides a more accurate estimate:
(.9 * (U - (M*2))) * (D + 1) ------------------------------- K + 6 + (4 * D)
where:
Remember that minimumKeySize and averageKeysize must have an extra 1 byte for each nullable key part and an extra byte for the length of each variable length key part.
If there are include columns, they should be accounted for in minimumKeySize and averageKeySize.
The .9 can be replaced by any (100 - pctfree)/100, if a percent free other than the default of ten (10) percent was specified during the index creation.
L = number of leaf pages = X / (avg number of keys on leaf page)
Where X is the total number of rows in the table.
You can estimate the original size of an index as:
(L + 2L/(average number of keys on leaf page)) * pagesize
For DMS tablespaces, add together the total sizes for all indexes on a table, and round up to a multiple of the extent size for the table space where the index resides.
You should provide additional space for index growth due to INSERT/UPDATE activity, which may result in page splits.
Use the following calculations to obtain a more accurate estimate of the original index size, as well as an estimate of the number of levels in the index. (This may be of particular interest if include columns are being used in the index definition.) The average number of keys per non leaf page is roughly:
(.9 * (U - (M*2))) * (D + 1) ------------------------------- K + 12 + (8 * D)
Where:
The minimumKeySize and averageKeySize will be the same as on leaf pages, except when there are include columns. Include columns are not stored on the non leaf pages, so the size of include columns should be excluded from the minimumKeySize and averageKeySize for non leaf page calculations.
You should not replace .9 with (100 - pctfree)/100 unless this value is greater than .9, because a maximum of 10% free space will be left on non-leaf pages during index creation.
The number of non-leaf pages can be estimated as follows:
if L > 1 then {P++; Z++} While (Y > 1) { P = P + Y Y = Y / N Z++ }
So the total number of pages is T = (L + P + 2) * 1.0002. The additional .02% is for overhead such as space map pages.
The amount of space required to create the index is estimated as T * pagesize, and the number of levels in index tree is estimated to be Z.