Oscillations: The Rhythmic Heartbeat of Physics Oscillations describe any system that moves back and forth in a periodic manner. The most familiar example might be the swinging of a pendulum, but oscillatory behavior occurs in countless natural systems, from the vibrations of molecules to the orbits of celestial bodies. Key Concepts in Oscillations: Simple Harmonic Motion (SHM) : This is the most basic type of oscillation, where the restoring force acting on an object is proportional to its displacement. Classic examples include a mass on a spring or a pendulum swinging with small amplitudes. The equations governing SHM are simple, but they form the basis for understanding more complex oscillatory systems. Damped and Driven Oscillations : In real-world systems, oscillations tend to lose energy over time due to friction or air resistance, leading to damped oscillations . In contrast, driven oscillations occur when an external force continuously adds energy to the system, preventing i
Free-Space Management
Another important feature of disk management is keeping track of and allocating free space.
Bit Vector
* One simple method is to use a bit vector, in which each bit mean a disk block, set to 1 if free or 0 if allocated.
* Fast algorithms subsists for quickly finding contiguous blocks of a given size
* The down side is that a 40GB disk requires over 5MB just to place the bitmap. ( For example. )
Linked List
* A linked list can also be used to keep trace of all free blocks.
* Traversing the list and/or finding a continous block of a given size are not easy, but fortunately are not frequently needed operations. Normally the system just joins and deletes single blocks from the beginning of the list.
* The FAT table keeps trace of the free list as just one more linked list on the table.
Grouping
A difference on linked list free lists is to use links of blocks of indexes of free blocks. If a block interrupts to N addresses, then the first block in the linked-list contains up to N-1 addresses of free blocks and a pointer to the next block of free addresses.
Counting
When there are many continous blocks of free space then the system can keep track of the starting address of the group and the number of contiguous free blocks. As long as the medium length of a continous group of free blocks is greater than two this offers a savings in space
needed for the free list. ( Similar to squash techniques used for graphics images when a group of pixels all the same color is encountered. )
Space Maps
* Sun's ZFS file system was planed for HUGE numbers and sizes of files, directories, and even file systems.
* The resulting data structures could be VERY ineffective if not implemented carefully. For example, freeing up a 1 GB file on a 1 TB file system could adds updating thousands of blocks of free list bit maps if the file was spread across the disk.
* ZFS uses a set of techniques, starting with dividing the disk up into. (hundreds of) metaslabs of a controllable size, each having their own space map.
* Free blocks are controlled using the counting methods, but rather than write the information to a table, it is recorded in a log-structured transaction record. Adjacent free blocks are also combined into a larger single free block.
* An in-memory space map is built using a balanced tree data structure, constructed from the log data.
* The set of the in-memory tree and the on-disk log provide for very fast and efficient management of these very large files and free blocks.
Efficiency and Performance
Efficiency
* UNIX pre-allocates index nodes, which occupies space even before any files are created.
* UNIX also distributed index nodes across the disk, and tries to save data files near their index nodes, to decrease the distance of disk seeks between the inodes and the data.
* Some systems use different size clusters depending on the file size.
* The more data that is kept in a directory ( e.g. last access time ), the more often the directory blocks have to be enlarged.
* As technology improves, addressing schemes have had to grow as well.
• Sun's ZFS file system uses 128-bit pointers, which must theoretically not
need to be expanded. ( The mass needed to store 2^128 bytes with atomic storage should be at least 272 trillion kilograms! )
* Kernel table sizes used to be fixed, and could only be modified by reconstructing the kernels. Modern tables are dynamically allocated, but that requires more hard algorithms for accessing them.
Performance
* Disk controllers generally include on-board caching. When a seek is requested, the heads are moved into place, and then an entire track is read, starting from whatever sector is currently under the heads ( reducing latency. ) The requested block is backed and the unrequested portion of the track is cached in the disk's electronics.
* Some OSes cache disk blocks they expect to require again in a buffer cache.
* A page cache connected to the virtual memory system is actually more effective as memory addresses do not need to be converted to disk block addresses and back again.
* Some systems ( Solaris, Linux, Windows 2000, NT, XP ) has page caching for both
process pages and file data in a unite virtual memory.
*Figures below show the merits of the unique buffer cache present in some versions of UNIX and Linux - Data does not require to be stored twice, and problems of inconsistent buffer information are avoided.
* Page replacement methods can be hard with a unite cache, as one needs to decide whether to replace process or file pages, and how many pages to guarantee to each category of pages. Solaris, for example, has gone through many differences, resulting in priority paging giving process pages priority over file I/O pages, and setting limits so that neither can knock the other completely out of memory.
* Another issue affecting performance is the question of where to implement
synchronous writes or asynchronous writes. Synchronous writes occur in the order in which the disk subsystem receives them, without caching; Asynchronous writes are cached, permitting the disk subsystem to schedule writes in a more efficient order Metadata writes are often done synchronously. Some systems support flags to the open call requiring that writes be concurrent, for example for the advantage of database systems that require their writes be performed in a required order.
* The type of file retrieve can also have an impact on optimal page replacement policies. For example, LRU is not compulsorily a good policy for sequential access files. For these types of files progression normally goes in a forward direction only, and the most recently used
page will not be needed again until after the file has been rewound and re-read from the beginning, ( if it is ever needed at all. ) On the other hand, we can expect to need the next page in the file fairly soon. For this reason sequential access files often take advantage of two special policies:
• Free-behind frees up a page as soon as the next page in the file is requested, with
the assumption that we are now complete with the old page and won't need it again for a long time.
• Read-ahead reads the requested page and several following pages at the same time, with the assumption that those pages will be needed in the near future. This is same to the track caching that is already performed by the disk controller,
except it saves the future latency of transferring data from the disk controller
memory into motherboard main memory.
* The caching system and asynchronous writes speed up disk writes considerably, because the disk subsystem can schedule physical writes to the disk to decrease head movement and disk seek times. Reads, on the other hand, must be finish more synchronously in spite of the caching system, with the result that disk writes can counter-intuitively be much
faster on average than disk reads.
Recovery
Consistency Checking
* The storing of certain data structures (e.g. directories and index nodes ) in memory and the caching of disk operations can speed up performance, but what happens in the result of a system crash? All volatile memory structures are lost, and the information stored on the hard drive may be left in an uncertain state.
* A Consistency Checker ( fsck in UNIX, chkdsk or scandisk in Windows ) is often run at boot time or mount time, particularly if a file system was not closed down properly. Some of the problems that the tools look for include:
• Disk blocks assigned to files and also listed on the free list.
• Disk blocks neither assigned to files nor on the free list.
• Disk blocks assigned to more than one file.
• The number of disk blocks assigned to a file inconsistent with the file's stated
size.
• Properly assigned files / index nodes which do not appear in any directory entry.
• Link counts for an index node not matching the number of references to that inode in
the directory structure.
• Two or more same file names in the same directory.
• Illegally linked directories, e.g. cyclical relationships where those are not permited, or files/directories that are not accessible from the root of the directory tree.
• Consistency checkers will often gather questionable disk blocks into new files
with names such as chk00001.dat. These files may contain valuable information
that would otherwise be lost, but in most cases they can be safely removed (returning those disk blocks to the free list. )
* UNIX caches directory information for reads, but any modifications that affect space allocation or metadata changes are written synchronously, before any of the corresponding data blocks are written to.
Log-Structured File Systems
* Log-based transaction-oriented ( a.k.a. journaling ) file systems borrow methods developed for databases, guaranteeing that any given transaction either completes successfully or can be rolled back to a safe state before the transaction commenced:
• All metadata changes are written orderly to a log.
• A set of changes for performing a particular task ( e.g. moving a file ) is a
transaction.
• As changes are written to the log they are said to be committed, permitting the system to return to its work.
• In the meantime, the changes from the log are taken out on the actual file
system, and a pointer keeps trace of which changes in the log have been
completed and which have not yet been completed.
• When all edits corresponding to a particular transaction have been completed, that transaction can be safely removed from the log.
* At any given time, the log will contain information containing to uncompleted
transactions only, e.g. actions that were committed but for which the whole
transaction has not yet been completed.
• From the log, the balance transactions can be completed,
• or if the transaction was deleted, then the partially completed changes can
be undone.
Other Solutions
* Sun's ZFS and Network Appliance's WAFL file systems take a different methods to file system consistency.
* No blocks of data are ever enlarged in place. Rather the new data is written into fresh new blocks, and after the transaction is complete, the metadata ( data block pointers ) is updated to point to the new blocks.
• The old blocks can then be freed up for future use.
• On the other hand, if the old blocks and old metadata are saved, then a snapshot of the system in its original state is safed. This approach is taken by WAFL.
* ZFS merges this with check-summing of all metadata and data blocks, and RAID, to ensure that no inconsistencies are possible, and therefore ZFS does not incorporate a consistency checker.
Backup and Restore
* In order to retrieve lost data in the event of a disk crash, it is important to conduct
backups regularly.
* Files should be duplicate to some removable medium, such as magnetic tapes, CDs, DVDs, or external removable hard drives.
* A full backup duplicates every file on a file system.
* Additional backups copy only files which have changed since some previous time.
* A combination of full and additional backups can offer a compromise between full recoverability, the number and size of backup tapes needed, and the number of tapes that need to be used to do a full restore. For example, one strategy might be:
• At the start of the month do a full backup.
• At the terminate of the first and again at the terminate of the second week, backup all files which have modified since the beginning of the month.
• At the end of the third week, backup all files that have modified since the end of
the second week.
• Every day of the month not listed above, do an additional backup of all files that
have changed since the most recent of the weekly backups described above.
* Backup tapes are often reused, specific for daily backups, but there are limits to how many times the same tape can be used.
* Every so often a full backup must be made that is kept "forever" and not overwritten.
* Backup tapes must be tested, to ensure that they are readable!
* For optimal security, backup tapes must be kept off-premises, so that a fire or burglary cannot destroy both the system and the backups. There are companies (e.g. Iron Mountain ) that exclusively in the secure off-site storage of critical backup information.
* Keep your backup tapes safe.The easiest way for a thief to steal all your data is to simply pocket your backup tapes!
* Saving important files on more than one computer can be an alternate though less
reliable form of backup.
* Note that additional backups can also help users to get back a previous version of a file that they have since changed in some way.
* Beware that backups can help forensic investigators retrieve e-mails and other files that users had though they had deleted!