askvity

Understanding 'Size' vs. 'Size on Disk'

Published in Storage Management Concepts 4 mins read

Okay, let's break down the difference between 'size' and 'size on disk'.

In computing, when you look at a file or a collection of files, you might see two different measurements: 'size' and 'size on disk'. These two values represent distinct aspects of how data is stored and managed.

Simply put: 'Size' tells you the raw amount of data, while 'Size on Disk' tells you the actual physical space it occupies on the storage medium.

What is 'Size'?

As per the reference, 'size' is:

"how much the total of all files is - excluding compression and deduplication."

This refers to the logical size of the data. If you were to open a file and count every byte of information contained within it, that count would represent its 'size'. For a folder, it's the sum of the 'size' of all files and subfolders within it. This number doesn't change based on how the storage system is configured or how efficiently it stores data; it represents the inherent volume of the data itself.

What is 'Size on Disk'?

According to the reference, 'Size on disk' is:

"physically used space on the underlying storage (SSD/HDD/NVMe/LUN etc), including any deduction for compression or block level deduplication, or contrary, any padding in the blocks where empty space is."

This measurement reflects the real amount of space the data consumes on your hard drive, SSD, or other storage device. Several factors can cause this value to differ from the logical 'size':

  • Compression: If the storage system or file system uses compression, the data is stored in a more compact form, reducing the 'size on disk'.
  • Deduplication: If the storage system uses deduplication, identical blocks of data are stored only once, with pointers referencing that single copy. This significantly reduces the 'size on disk' for redundant data.
  • Block Padding: Storage devices allocate space in fixed-size blocks (e.g., 4KB, 8KB). Even if a file is smaller than a block size (e.g., a 1KB file on a 4KB block system), it will still occupy an entire block. The unused space within that block is padding, which increases the 'size on disk' compared to the file's actual 'size'.

Key Differences Summarized

Here's a table highlighting the core distinctions based on the definition provided:

Feature 'Size' 'Size on Disk'
Represents Logical data volume Actual physical space used on storage
Includes Total data bytes Physical storage blocks occupied
Excludes Space savings from compression/deduplication
Includes (Potentially) Overhead/Padding due to block allocation
Affected By Amount of data in the file(s) File system, block size, compression, deduplication

Practical Insights and Examples

  • Compression Example: A 10MB document (Size: 10MB) stored on a compressed drive might only take up 5MB of space (Size on Disk: 5MB).
  • Deduplication Example: If you have five identical copies of a 1GB video file (Total Size: 5GB) on a storage system with deduplication, the system might only store the data once, plus some metadata. The 'Size on Disk' could be just slightly over 1GB, a significant saving compared to the total 'Size'.
  • Padding Example: Many small files (e.g., 100 files each 1KB in Size) on a file system with a 4KB block size. Their total 'Size' is 100 KB. However, each file occupies a full 4KB block, so the 'Size on Disk' would be 100 * 4KB = 400 KB.

Understanding the difference is crucial for managing storage space effectively, especially in environments utilizing advanced storage features like data reduction technologies. While 'size' tells you how much data you have, 'size on disk' tells you how much space it's consuming.

Related Articles