FILE ORGANIZATION

File organization refers to the way records are arranged (laid out) within a particular file.

The term file organization can also refer to the relationship of the Key of a record to the physical location of that record in the computer file.

File organization is very important because; it determines the method of access, efficiency, flexibility, and storage devices to be used.

Methods of file organization.

There are 4 methods by which records of a file can be arranged and accessed.  These include:

  1. Random.
  2. Serial.
  3. Sequential.
  4. Indexed sequential.

Random (Direct) file organization.

In Random or direct file organization, the records are stored in the file randomly, and in no particular order.  This implies that, there is no relationship between two adjacent records.

An Algorithm (mathematical procedure) is applied onto the record key to generate the address of the location where the record would be stored.

  Record 2 K2 Record 3 K3 Record 8 K8 Record 92 K92 Record 1 K1  

      K’s – Record keys.

Random files are usually accessed directly.  To access the file, the record key is used to determine where a record is stored on the storage media.  Once the record is located, it is then read into the computer memory.

This method is used by Magnetic disks and Optical disks.

Advantages of Random file organization.

  1. Records are quickly accessed (i.e. there is fast access to records).
  2. Files are easily updated (i.e. adding, deleting, and amending the records is easily achieved).
  3. The method does not require the use of indexes, hence saving space.
  4. Transactions do not need to be sorted before being updated.
  5. New records can be easily inserted into a random file.

Disadvantages of Random file organization

  1. Data may be accidentally erased or overwritten unless special precautions are taken.
  2. Random files are less efficient in the use of storage space compared to sequentially organized files.
  3. Expensive hardware and software resources are required.
  4. Relatively complex when programming.
  5. System design based on random file organization is complex and costly.

Serial file organization.

In Serial file organization, records in a file are stored one after the other in the order they come into the file without any particular sequence.  The records are not sorted in any way on the storage medium, and there is no relationship that exists between adjacent records.

This type of organization is mostly used on Magnetic tapes.

  Record     1 IRG Record    2 IRG Record         3  

File ‘head’                                          File ‘tail’

Serial files can be accessed serially.  This involves searching through the entire file record by record starting from the ‘head’ of the file towards the ‘tail’ of the file.

Note.  Serial access is suitable where all the records in the file are to be read.  This is because; even the records that are not required must be passed over before locating the record of interest.  E.g., to access the 10th record in the file, then the computer reads the first 9 records before reading the 10th record.  Therefore,.

Sequential file organization.

In Sequential file organization, the records are arranged within the file serially one after the other.  However, in sequential file organization, the records are stored in a particular order sorted using a key field; hence, there is a relationship that exists between adjacent records and the key fields.

  Record 1 K1 Record 2 K2 Record 3 K3 Record 4 K4  

      K1 – K4 – Record keys.

Sequential files are accessed sequentially, i.e. the key field is used to search for the particular record required.  Searching starts at the beginning of the file and proceeds sequentially towards the ‘tail’ of the file, until the required record is located. 

Advantages of Sequential organization.

  1. The method is simple & easy to understand.
  2. Sequential files are easy to organize and maintain.
  3. Loading or reading a record requires only the Record Key.
  4. It is efficient & economical if the number of file records to be processed is high.
  5. Relatively inexpensive Input/Output media and devices may be used.
  6. Errors in the files remain localized.

Disadvantages of Sequential organization.

  1. The entire file must be processed even when the no. of file records to be processed is low.
  2. Transactions must be sorted in the sequence of the Master file before they can be processed or updated.
  3. Data redundancy/idleness is high since the same data may be stored in several files sequenced in different keys.
  4. Random enquiries are almost impossible to handle.

Indexed Sequential file organization.

The records are arranged sequentially as in sequential files.  However, indexed sequential files have an Index that enables the computer to locate individual records on the storage media.

An Index is the address of a particular cylinder or track.  The indexes are used to point at the portions where the records are stored in groups.  This allows a group of records that are not required in a particular processing run to be bypassed.

                  a                                  b                                    c

Record 1 K1 Record 2 K2 Record 3 K3 Record 4 K4 Record 5 K5 Record 6 K6

      a, b, c – indexes.               K1-K6 – Record keys.

To access a record in an indexed sequential file, the Index and the record’s key field are used by the computer to search for the required record before it is read into the computer memory.

Methods of accessing Indexed sequential files.

Indexed sequential files may be accessed using 3 methods;

  • Sequential access.
  • Selective sequential access.
  • Random access.

Sequential access:

In sequential access, the computer reads the records in sequential order (i.e., one record after the other) using the index until the record matching the search key is found.  The record is then read into the Main memory.

Sequential access is suitable for high activity files.

Selective Sequential access:

In this selective sequential access, the transaction file must first be sorted into the same key sequence as the master file.  The access mechanism then goes forward in an ordered progression (sequence), and only those records needed are read/processed.

The method is suitable for low activity files.

Random (direct) access:

The records in a Random file are not stored in any particular sequence of the key field.  This means that, the records can be processed in any sequence, i.e., by moving access mechanism forward and backwards along the file in a non-orderly manner to access the records required. 

The method is suitable for low activity files.

Advantages of Indexed sequential file organization

  1. Records can be accessed sequentially or randomly.
  2. Accessing of records can be fast, if done randomly.
  3. Records are not duplicated.

Disadvantages of Indexed sequential file organization.

  1. Accessing of records sequentially is time consuming.
  2. Processing of records sequentially may introduce redundancy/idleness.
  3. Required expensive storage medium.

Comparison of File organization methods.

Method Method of access Medium used Example of file
Random Random Magnetic disk. Optical disk Master files requiring fast reference or enquiry.
Serial Serial Magnetic tape. Magnetic disk Unsorted transaction file
Sequential Serial (Sequential) Tape, Disk Sorted transaction file, or Sequential Master file.
Index Sequential SequentialSelective sequentialRandom Magnetic disk Master files requiring various processing activities.

File organization & access on a Magnetic Tape.

In a Magnetic tape, the file records are placed one after the other onto the tape.

There are 2 ways in which files are arranged on tapes:

  1. Serial:

In serial organization, the records are written onto the tape without having any relationship between the record keys.

  • Serial files on a tape are accessed serially, i.e., each record is read from the tape into main storage one after the other in the order they occur on the tape.
  • Sequential.

In Sequential organization, the records are written onto tape in sequence according to the record keys.  Sequential files are accessed sequentially.

Explanation;

To process a sequential Master file on a tape, the transaction file must be in the sequence of the Master file.  The transaction file is read first, followed by the Master file until the matching file record is found.  E.g., if the record required is the 20th record of the file, the computer must first read all the 19 preceding records.

File organization & access on a Magnetic Disk.

There are 4 basic methods of organizing files on a Magnetic disk:

  1. Serial:

The records are placed onto the disk one after the other with no regard for sequence.

  • Serial files on a disk are accessed Serially, i.e. each record is read from the disk into main storage one after the other in the order they occur on the disk.
  • Sequential:

In sequential organization, the records are written onto the disk but in a defined sequence according to the record keys.

  • The Sequential method of access is used to read a sequential disk file.
  • Random:

In random organization, the records are placed onto the disk “randomly”, (i.e. there is no obvious relationship between the records).

A mathematical formula is used to generate the address of the location where the record is placed on the disk.  During processing, the same record key is used to generate the address which shows the location from which the record is read.

  • The method of access to random files is Random (direct) access.
  • Indexed Sequential:

In Indexed Sequential organization, the records are stored in sequence, but an Index (key field/guide) is provided to enable individual records to be located.  In this case, the index will always enable the sequence of the records to be determined.

Indexed sequential files can be accessed using sequential access, selective sequential access, or random access method.

Factors to consider when choosing the type of file organization to use.

  1. Frequency of update.

The file designer should determine how often the file is going to require updating.

For periodic updates (e.g., monthly update), the transactions are used to update the master files in one run.  For the non-periodic systems, the transactions may be updated anytime as required.

The file design selected should therefore be able to meet the update strategies, and at the required time.

  • File activity.

The type of file organization adopted should be based on the expected number of records to be processes/accessed in a particular run.

  • Method of file access.

This refers to the method the computer shall use to transfer the contents of the file from the storage media into the computer.

  • Nature of the system.

Before designing the file(s) to be maintained by a computer system, you have to consider whether the system runs periodically or is an event-driven system.

In periodically run systems, all transactions relating to particular business are accumulated over a period of time, after which they are applied to the relevant master files in a single run.  Such systems produce periodic reports from the maintained files.

On the other hand, event-driven systems allow file enquiries and instant update so long as the transactions are available from the maintained master files for the production of instant information.

  • Medium for storing the Master file.

Computer files are stored in the storage media.  The type of file organization adopted depends on the medium that will be used to store the computer file.

E.g., Serial access devices, such as Magnetic Tapes cannot be used to store Random files or Indexed-sequential files.  This is because; searching for the particular record required proceeds serially regardless of the file organization method used.

Review Questions.

  1. What do you mean by File Organization?
  2. State and explain four types of file organization.
  3. Distinguish between:
    1. Sequential and serial file organization methods.
    1. Random and indexed-sequential file organization methods.
  4. (a). Describe how files are organized and accessed on tape.

(b). What are the disadvantages of storing files on tape?

  • Differentiate between Sequential and Indexed Sequential methods of file organization on disk.
  • (a). What is random file organization?  State its advantages.

(b). How are Random files accessed on disk?

  • Identify four file processing methods.
  • Discuss four considerations for choosing a file organization method.