Linux‎ > ‎Extended File System‎ > ‎

First Scenario

Let us suppose we have the following disk partition:
Disk /dev/sda2: 64 KiB, 65536 bytes, 128 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

I have chosen such a small partition in order to make this first example more simple and easier to understand.

Let us check that it is completely empty:
dd if=/dev/zero of=/dev/sda2
dd: writing to ‘/dev/sda2’: No space left on device
129+0 records in
128+0 records out
65536 bytes (66 kB) copied, 0.0222607 s, 2.9 MB/s

# od /dev/sda2
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
0200000

Of course, we can write and read directly to the bare metal, without the need of making or mounting any File System:
# dd of=/dev/sda2
My letter:
Dear friend, I wrote you these lines to show you that I can write directly to disk without the need of any file system.
Best regards,
Sebastian.
0+4 records in
0+1 records out
156 bytes (156 B) copied, 41.1907 s, 0.0 kB/s

# od -c /dev/sda2
0000000   M   y       l   e   t   t   e   r   :  \n   D   e   a   r    
0000020   f   r   i   e   n   d   ,       I       w   r   o   t   e    
0000040   y   o   u       t   h   e   s   e       l   i   n   e   s    
0000060   t   o       s   h   o   w       y   o   u       t   h   a   t
0000100       I       c   a   n       w   r   i   t   e       d   i   r
0000120   e   c   t   l   y       t   o       d   i   s   k       w   i
0000140   t   h   o   u   t       t   h   e       n   e   e   d       o
0000160   f       a   n   y       f   i   l   e       s   y   s   t   e
0000200   m   .  \n   B   e   s   t       r   e   g   a   r   d   s   ,
0000220  \n   S   e   b   a   s   t   i   a   n   .  \n  \0  \0  \0  \0
0000240  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0200000

We can even encrypt and decrypt our data in order to protect it from being read:
# dd if=/dev/sda2 count=1 status=none|gpg -ca|dd of=/dev/sda2 count=1
0+1 records in     
0+1 records out
319 bytes (319 B) copied, 6.7184 s, 0.0 kB/s

# od -c /dev/sda2
0000000   -   -   -   -   -   B   E   G   I   N       P   G   P       M
0000020   E   S   S   A   G   E   -   -   -   -   -  \n   V   e   r   s
0000040   i   o   n   :       G   n   u   P   G       v   1  \n  \n   j
0000060   A   0   E   A   w   M   C   x   8   h   Z   z   m   p   1   +
0000100   k   N   g   y   Z   7   W   4   t   m   W   z   q   H   7   v
0000120   p   1   t   F   2   R   4   k   d   Q   b   h   c   9   4   k
0000140   R   +   A   z   3   e   s   g   j   l   p   b   s   U   W  \n
0000160   2   Y   d   a   w   e   6   y   k   o   G   M   J   X   N   U
0000200   3   2   u   b   p   d   w   d   3   i   q   5   9   c   w   e
0000220   b   j   X   S   j   1   D   6   D   b   x   g   W   Y   V   S
0000240   2   u   9   W   3   N   U   P   b   F   u   h   d   U   M   d
0000260  \n   Z   m   B   p   2   J   a   q   c   +   G   6   Q   /   g
0000300   a   G   U   L   d   j   i   s   e   8   4   z   U   V   y   w
0000320   4   k   7   /   h   O   O   3   Y   c   H   Z   /   s   Z   e
0000340   k   B   A   q   B   4   2   9   N   k   +   I   /   E   8   A
0000360   O  \n   K   B   h   +   H   l   W   c   /   G   R   W   a   0
0000400   I   g   6   R   w   9   u   K   f   9   0   Y   s   H   a   P
0000420   3   6   Z   T   a   D   G   u   J   l   a   g   =   =  \n   =
0000440   e   s   c   A  \n   -   -   -   -   -   E   N   D       P   G
0000460   P       M   E   S   S   A   G   E   -   -   -   -   -  \n  \0
0000500  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0200000

# dd if=/dev/sda2 count=1 status=none|gpg -d|dd of=/dev/sda2 count=1
gpg: CAST5 encrypted data
gpg: encrypted with 1 passphrase
gpg: WARNING: message was not integrity protected
1+0 records in
1+0 records out
512 bytes (512 B) copied, 3.19328 s, 0.2 kB/s

# od -c /dev/sda2 
0000000   M   y       l   e   t   t   e   r   :  \n   D   e   a   r    
0000020   f   r   i   e   n   d   ,       I       w   r   o   t   e    
0000040   y   o   u       t   h   e   s   e       l   i   n   e   s    
0000060   t   o       s   h   o   w       y   o   u       t   h   a   t
0000100       I       c   a   n       w   r   i   t   e       d   i   r
0000120   e   c   t   l   y       t   o       d   i   s   k       w   i
0000140   t   h   o   u   t       t   h   e       n   e   e   d       o
0000160   f       a   n   y       f   i   l   e       s   y   s   t   e
0000200   m   .  \n   B   e   s   t       r   e   g   a   r   d   s   ,
0000220  \n   S   e   b   a   s   t   i   a   n   .  \n  \0  \0  \0  \0
0000240  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0200000

It is a very comfortable and useful way to write and read from disk. As we have a partition of 65536 bytes, and each character needs a byte to be stored in disk, we can write as much as 65536 characters in total.

Nevertheless, some people need more comfort when using computers and dealing with data or text. The fact is that File Systems exist in order to better organize and manage data and code written to disk. They have an ordered list of all items (files) written to disk, as well as detailed information about other attributes, such as permissions, time stamps, size, ownership and location within the file system tree, among other subjects.

Let us make an Extended File System in our partition:
# mkfs /dev/sda2
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 64 1k blocks and 16 inodes

Allocating group tables: done                            
Writing inode tables: done                            
Writing superblocks and filesystem accounting information: done

# od -c /dev/sda2|head
0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0002000 020  \0  \0  \0   @  \0  \0  \0 003  \0  \0  \0   +  \0  \0  \0
0002020 005  \0  \0  \0 001  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
0002040  \0      \0  \0  \0      \0  \0 020  \0  \0  \0  \0  \0  \0  \0
0002060   W   k   o   U  \0  \0 377 377   S 357 001  \0 001  \0  \0  \0
0002100   W   k   o   U  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0
0002120  \0  \0  \0  \0  \v  \0  \0  \0 200  \0  \0  \0   8  \0  \0  \0
0002140 002  \0  \0  \0 001  \0  \0  \0   a   H 257 324   7   K   F 264
0002160 262 276   n 323   .   A 231 222  \0  \0  \0  \0  \0  \0  \0  \0

As you can check, our previous letter has been deleted, and the partition is not empty anymore, since it is now plenty of metadata distributed all over. This metadata is very important for us, since it gives us interesting information about the contents and characteristics of the file system.

Let us read byte-by-byte the contents of the entire partition. Remember that the Extended File System has divided the partition in blocks of 1024 bytes.

Block number 0: Partition Boot Record

# od -Ad -N1024 -j0 /dev/sda2 
0000000 000000 000000 000000 000000 000000 000000 000000 000000
*
0001024

Block number 1: Super Block

# od -Ad -txz -N1024 -j1024 /dev/sda2
0001024 00000010 00000040 00000000 0000002b  >....@.......+...<
0001040 00000005 00000001 00000000 00000000  >................<
0001056 00002000 00002000 00000010 5570b766  >. ... ......f.pU<
0001072 5570c5ce ffff0001 0001ef53 00000002  >..pU....S.......<
0001088 556f73b9 00000000 00000000 00000001  >.soU............<
0001104 00000000 0000000b 00000080 00000038  >............8...<
0001120 00000002 00000001 4f081d71 f545956b  >........q..Ok.E.<
0001136 72f0fe99 d9480b70 73726946 63535f74  >...rp.H.First_Sc<
0001152 72616e65 00006f69 746e6d2f 00000000  >enario../mnt....<
0001168 00000000 00000000 00000000 00000000  >................<
*
0001248 00000000 00000000 00000000 fd2690c5  >..............&.<
0001264 764ed7bb 01b9f6ad 97f2476f 00000001  >..Nv....oG......<
0001280 0000000c 00000000 556f73b9 00000000  >.........soU....<
0001296 00000000 00000000 00000000 00000000  >................<
*
0001376 00000001 00000000 00000000 00000000  >................<
0001392 00000000 00000000 00000004 00000000  >................<
0001408 00000000 00000000 00000000 00000000  >................<
*
0002048

s_inodes_count 0001024 16
s_blocks_count 0001028 64
s_r_blocks_count 0001032 0
s_free_blocks_count 0001036 43
s_free_inodes_count 0001040 5
s_first_data_block 0001044 1
s_log_block_size 0001048 1024
s_log_frag_size 0001052 1024
s_blocks_per_group 0001056 8192
s_frags_per_group 0001060 8192
s_inodes_per_group 0001064 16
s_mtime 0001068 Thu Jun  4 22:39:02 CEST 2015
s_wtime 0001072 Thu Jun  4 23:40:30 CEST 2015
s_mnt_count 0001076 1
s_max_mnt_count 0001078 65535
s_magic 0001080 61267
s_state 0001082 1
s_errors 0001084 2
s_minor_rev_level 0001086 0
s_lastcheck 0001088 Wed Jun  3 23:38:01 CEST 2015
s_checkinterval 0001092 0
s_creator_os 0001096 0
s_rev_level 0001100 1
s_def_resuid 0001104 0
s_def_resgid 0001106 0
s_first_ino 0001108 11
s_inode_size 0001112 128
s_block_group_nr 0001114 0
s_feature_compat 0001116 00000038
s_feature_incompat 0001120 00000002
s_feature_ro_compat 0001124 00000001
s_uuid 0001128 71 1d 08 4f 6b 95 45 f5 99 fe f0 72 70 0b 48 d9
s_volume_name 0001144 F   i   r   s   t   _   S   c   e   n   a   r   i   o  \0  \0
s_last_mounted 0001160 /   m   n   t  \0  \0  \0  \0  \0  \0  \0  \0  \0
s_algo_bitmap 0001224 0
s_prealloc_blocks 0001228 0
s_prealloc_dir_blocks 0001229 0
allignment 0001230 00 00
s_journal_uuid 0001232 \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
s_journal_inum 0001248 0
s_journal_dev 0001252 0
s_last_orphan 0001256 0
s_hash_seed 0001260 fd2690c5 764ed7bb 01b9f6ad 97f2476f
s_def_hash_version 0001276 001
padding 0001277 00 00 00
s_default_mount_options 0001280 12
s_first_meta_bg 0001284 0

Block number 2: Block Group Descriptor Table

# od -Ad -txz -N1024 -j2048 /dev/sda2
0002048 00000003 00000004 00000005 0005002b  >............+...<
0002064 00040002 00000000 00000000 00000000  >................<
0002080 00000000 00000000 00000000 00000000  >................<
*
0003072

bg_block_bitmap     0002048    3
bg_inode_bitmap     0002052    4
bg_inode_table     0002056    5
bg_free_blocks_count    0002060    43
bg_free_inodes_count 0002062    5

Block number 3: Block Bitmap

# echo $(od -b -An -N8 -j3072 /dev/sda2)|ascii2binary -bo|binary2ascii \
-bb|tac|paste -sd " "|rev
11111111 11111111 11110000 00000000 00000000 00000000 00000000 00000001
It is a bit-by-bit mapping of all the blocks in the device. If they are set to 1, it means that they are already reserved or occupied. When they are free, they are set to 0. As we can see in the result of the command, the first 20 blocks are already reserved or occupied as well as the last one (the 64th block). Remember that each block is 1024 bytes in size, so that here we have represented 64 KiB = 65536 bytes.

Block number 4: Inode Bitmap

# echo $(od -b -An -N2 -j4096 /dev/sda2)|ascii2binary -bo|binary2ascii \
-bb|tac|paste -sd " "|rev
11111111 11100000
Each bit represents an inode. We have 16 inodes. Only the first 11 inodes are already occupied. 

Block number 5: Inode Table

# od -Ad -txz -N1024 -j5120 /dev/sda2
0005120 00000000 00000000 556f73b9 556f73b9  >.........soU.soU<
0005136 556f73b9 00000000 00000000 00000000  >.soU............<
0005152 00000000 00000000 00000000 00000000  >................<
*
0005248 000041ed 00000400 5570b772 556f73b9  >.A......r.pU.soU<
0005264 556f73b9 00000000 00030000 00000002  >.soU............<
0005280 00000000 00000000 00000007 00000000  >................<
0005296 00000000 00000000 00000000 00000000  >................<
*
0005888 00008180 04043000 556f73b9 556f73b9  >.....0...soU.soU<
0005904 556f73b9 00000000 00010000 00000002  >.soU............<
0005920 00000000 00000000 00000000 00000000  >................<
*
0005968 00000000 00000000 00000000 00000014  >................<
0005984 00000000 00000000 00000000 00000000  >................<
*
0006144

i_mode     0005248 41ed
i_uid     0005250 0
i_size      0005252 1024
i_atime     0005256 Thu Jun  4 22:39:14 CEST 2015
i_ctime     0005260 Wed Jun  3 23:38:01 CEST 2015
i_mtime     0005264 Wed Jun  3 23:38:01 CEST 2015
i_dtime         0005268 Thu Jan  1 01:00:00 CET 1970
i_gid     0005272 0
i_links_count   0005274 3
i_blocks     0005276 2
i_flags     0005280 00000000
i_osd1     0005284 00000000
i_block     0005288 7    0    0    0    0    0    0    0
                0005320 0    0    0    0    0    0    0
i_generation 0005348 00000000
i_file_acl         0005352 0
i_dir_acl         0005356 0
i_faddr         0005360 0
l_i_frag         0005364 000
l_i_fsize         0005365 000
reserved         0005366 0000
l_i_uid_high 0005368 0000
l_i_gid_high 0005370 0000
reserved         0005372 00 00 00 00    

Block number 6: Inode Table (continued)

# od -Ad -txz -N1024 -j6144 /dev/sda2
0006144 00000000 00000000 00000000 00000000  >................<
*
0006400 000041c0 00003000 5570b772 556f73b9  >.A...0..r.pU.soU<
0006416 556f73b9 00000000 00020000 00000018  >.soU............<
0006432 00000000 00000000 00000008 00000009  >................<
0006448 0000000a 0000000b 0000000c 0000000d  >................<
0006464 0000000e 0000000f 00000010 00000011  >................<
0006480 00000012 00000013 00000000 00000000  >................<
0006496 00000000 00000000 00000000 00000000  >................<
*
0007168

i_mode         0006400 41c0
i_uid         0006402 0
i_size         0006404 12288
i_atime         0006408 Thu Jun  4 22:39:14 CEST 2015
i_ctime         0006412 Wed Jun  3 23:38:01 CEST 2015
i_mtime         0006416 Wed Jun  3 23:38:01 CEST 2015
i_dtime         0006420 Thu Jan  1 01:00:00 CET 1970
i_gid         0006424 0
i_links_count 0006426 2
i_blocks         0006428 24
i_flags         0006432 00000000
i_osd1         0006436 00000000
i_block         0006440  89    10    11    12    13    14    15
                0006472 16  17  18  19  0   0   0
i_generation 0006500 00000000
i_file_acl         0006504 0
i_dir_acl         0006508 0
i_faddr         0006512 0
l_i_frag         0006516 000
l_i_fsize         0006517 000
reserved         0006518 0000
l_i_uid_high 0006520 0000
l_i_gid_high 0006522 0000
reserved         0006524 00 00 00 00

Block number 7: Data Block for Inode number 2

# od -Ad -txz -N1024 -j7168 /dev/sda2
0007168 00000002 0201000c 0000002e 00000002  >................<
0007184 0202000c 00002e2e 0000000b 020a03e8  >................<
0007200 74736f6c 756f662b 0000646e 00000000  >lost+found......<
0007216 00000000 00000000 00000000 00000000  >................<
*
0008192

inode 0007168    2
rec_len 0007172    12
name_len 0007174    1
file_type 0007175    2
name 0007176    .

inode 0007180    2
rec_len 0007184    12
name_len 0007186    2
file_type 0007187    2
name 0007188    .   .

inode 0007192    11
rec_len 0007196    1000
name_len 0007198    10
file_type 0007199    2
name 0007200    l   o   s   t   +   f   o   u   n   d

Block number 8: Data Block for Inode number 11

# od -Ad -txz -N1024 -j8192 /dev/sda2
0008192 0000000b 0201000c 0000002e 00000002  >................<
0008208 020203f4 00002e2e 00000000 00000000  >................<
0008224 00000000 00000000 00000000 00000000  >................<
*
0009216

inode 0008192    11
rec_len 0008196    12
name_len 0008198    1
file_type 0008199    2
name 0008200    .

inode 0008204    2
rec_len 0008208    1012
name_len 0008210    2
file_type 0008211    2
name 0008212    .   .

We have seen that inodes number 2 and 11 are very important for the file system. Let us mount the device, in order to check the meaning of these two inodes:
# mount /dev/sda2 /mnt
# ls /mnt -aRli
/mnt:
total 17
 2 drwxr-xr-x  3 root root  1024 Jun  3 23:38 .
 2 drwxr-xr-x 22 root root  4096 May 24 22:19 ..
11 drwx------  2 root root 12288 Jun  3 23:38 lost+found

/mnt/lost+found:
total 13
11 drwx------ 2 root root 12288 Jun  3 23:38 .
 2 drwxr-xr-x 3 root root  1024 Jun  3 23:38 ..

Inode number 2 has two different names in this file system, but all of them represent the same file:
/mnt/.
/mnt/lost+found/..

Inode number 11 has two different names in this file system, but all of them represent the same file:
/mnt/lost+found
/mnt/lost+found/.

Most of the superblock information can be obtained with the following command:
# tune2fs -l /dev/sda2|head
tune2fs 1.42.12 (29-Aug-2014)
Filesystem volume name:   <none>
Last mounted on:          <not available>
Filesystem UUID:          711d084f-6b95-45f5-99fe-f072700b48d9
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      ext_attr resize_inode dir_index filetype sparse_super
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean

We can even modify some of these values with the same command:
# tune2fs -e remount-ro -L First_Scenario -m 0 -M /mnt /dev/sda2
tune2fs 1.42.12 (29-Aug-2014)
Setting error behavior to 2
Setting reserved blocks percentage to 0% (0 blocks)

# tune2fs -l /dev/sda2|egrep 'volume|mounted|Errors|block count'
Filesystem volume name:   First_Scenario
Last mounted on:          /mnt
Errors behavior:          Remount read-only
Reserved block count:     0

We can use the following command to get partial information of a file, including some inode details:
# stat /mnt/.
  File: ‘/mnt/.’
  Size: 1024       Blocks: 2          IO Block: 1024   directory
Device: 802h/2050d Inode: 2           Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-06-04 22:39:14.000000000 +0200
Modify: 2015-06-03 23:38:01.000000000 +0200
Change: 2015-06-03 23:38:01.000000000 +0200
 Birth: -

# stat /mnt/lost+found
  File: ‘/mnt/lost+found’
  Size: 12288      Blocks: 24         IO Block: 1024   directory
Device: 802h/2050d Inode: 11          Links: 2
Access: (0700/drwx------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-06-04 22:39:14.000000000 +0200
Modify: 2015-06-03 23:38:01.000000000 +0200
Change: 2015-06-03 23:38:01.000000000 +0200
 Birth: -

This information extracted by this last command is stored in inode number 2 (128 bytes in size), which is stored in block number 5, between byte number 5248 (5*1024+128) and byte number 5376, and in inode number 11 (128 bytes in size), which is stored in block number 6, between byte number 6400 (5*1024+10*128) and byte number 6528. 

Though 'stat' command gives us a lot of information about an inode, it does not provide us the pointers to the disk blocks which actually store the file's contents. We can retrieve this information reading the inode directly from disk. The information we are looking for is located after the 40th byte of the inode:
# od -An -i -w24 -N48 -j6440 /dev/sda2 
           8           9          10          11          12          13
          14          15          16          17          18          19

As we have a file (lost+found) which is 12288 bytes in size, we need 12 blocks to store it: block numbers 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19. This file is actually a directory where the system is going to store the orphan data which is recovered after a file system crash. Because of that, not all 12 blocks are actually filled with data. They are empty, but reserved by the file system for future needs. The only actually written data is in block number 8.

Similarly we can check which block stores data from the file described in inode 2:
# od -An -i -w24 -N48 -j5288 /dev/sda2 
           7           0          0          0          0          0
           0           0          0          0          0          0

This time we only need a block of data for a file which is 1024 bytes in size: block number 7.

We have already inspected how data is actually written to disk, as well as metadata for the file system. We have used in this experiment a clean file system with just to directories in it: the root directory and the lost+found directory, both automatically created when building the file system. We have checked that inodes store relevant information about each file, as for example the ownership, permissions, and (most important of all) pointers to the disk blocks which actually store the file's contents. With this pointers we could recover the contents of a file, in case of file system disaster.

Besides, we have seen that the actual names given to a file is not a relevant information, and it is included in each directory with some other information of the files that it contains. A file is really described by an inode, and it can be given any name in its corresponding directory, even many different names for a single file. When deleting a file with many names, we just delete the corresponding entrance in the corresponding directory. But when we delete a file with just one name, then the corresponding inode is edited, and the file is not accessible any more.

To learn how the system internally proceeds when deleting and creating regular files (not only directories) we have prepared the Second Scenario.
ċ
0-read-superblock.sh
(6k)
Sebastian Colomar,
7 jun. 2015 23:14
ċ
1-read-block-group.sh
(1k)
Sebastian Colomar,
7 jun. 2015 23:14
ċ
2-read-bitmap-block.sh
(1k)
Sebastian Colomar,
7 jun. 2015 23:14
ċ
3-read-bitmap-inode.sh
(1k)
Sebastian Colomar,
7 jun. 2015 23:15
ċ
4-read-inode-table.sh
(3k)
Sebastian Colomar,
7 jun. 2015 23:15
ċ
5-read-directory.sh
(2k)
Sebastian Colomar,
7 jun. 2015 23:15
ċ
6-read-hidden-directory.sh
(2k)
Sebastian Colomar,
7 jun. 2015 23:15
ċ
7-read-file-data.sh
(0k)
Sebastian Colomar,
7 jun. 2015 23:15
Ċ
ext2.pdf
(452k)
Sebastian Colomar,
7 jun. 2015 23:14
Comments