DGTEFF: Difference between revisions
imported>Mr.Mouse m (→Hex Editors) |
imported>Mr.Mouse mNo edit summary |
||
| Line 221: | Line 221: | ||
Hex Workshop has another handy function - <font color="#008000">''GoTo''</font>. If you select a range of bytes in a file and choose GoTo from the context menu, you can jump to the location identified by the selected value. | Hex Workshop has another handy function - <font color="#008000">''GoTo''</font>. If you select a range of bytes in a file and choose GoTo from the context menu, you can jump to the location identified by the selected value. | ||
== Terms, Definitions and Data Structures == | ==Terms, Definitions, and Data Structures== | ||
To understand the patterns and construction of archives, we must first introduce the concept of data structures, and some of the fundamentals of computerized data. | |||
=== Files === | ===Files=== | ||
A computer <font color="#008000">''file''</font> is a series of bytes stored one after the other which, when combined together, form a representation of a piece of data. If you have a file that is 12 bytes in size, it indicates that there are 12 single bytes of data that are used to represent the entire document. | |||
=== | {|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | ||
|align = "justify" bgcolor = "#E6E6E6"|The term <font color="#008000">File</font> stems from the original computer metaphor as an office replacement. As in a work office, <font color="#008000">files</font> were organised into <font color="#008000">folders</font>, where each folder contained a group of related files. | |||
|- | |||
|} | |||
File sizes start off at the preliminary byte, and change terms at every increment of 1024 bytes (although, for each of human use, most people refer to increments of 1000 bytes). The following table shows the increments of file size terms: | |||
=== | {|border="2" cellspacing="0" cellpadding="4" width="88%" | ||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Byte</font> <font color="#008000">(B)</font> | |||
|align = "center"|<br> | |||
|align = "center"| | |||
|align = "center"| | |||
|align = "center"| | |||
|- | |||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Kilobyte</font> <font color="#008000">(KB)</font> | |||
|align = "center"|1,024<br><font size = "1">(1 thousand bytes)</font> | |||
|align = "center"| | |||
|align = "center"| | |||
|align = "center"| | |||
|- | |||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Megabyte </font><font color="#008000">(MB)</font> | |||
|align = "center"|1,048,576<br><font size = "1">(1 million bytes)</font> | |||
|align = "center"|1,024<br><font size = "1">(1 thousand KB)</font> | |||
|align = "center"| | |||
|align = "center"| | |||
|- | |||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Gigabyte</font> <font color="#008000">(GB)</font> | |||
|align = "center"|1,073,741,824<br><font size = "1">(1 billion bytes)</font> | |||
|align = "center"|1,048,576<br><font size = "1">(1 million KB)</font> | |||
|align = "center"|1,024<br><font size = "1">(1 thousand MB)</font> | |||
|align = "center"| | |||
|- | |||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Terabyte</font> <font color="#008000">(TB)</font> | |||
|align = "center"|1,099,511,627,776<br><font size = "1">(1 trillion bytes)</font> | |||
|align = "center"|1,073,741,824<br><font size = "1">(1 billion KB)</font> | |||
|align = "center"|1,048,576<br><font size = "1">(1 million MB)</font> | |||
|align = "center"|1,024<br><font size = "1">(1 thousand GB)</font> | |||
|- | |||
|} | |||
=== | {|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | ||
|align = "justify" bgcolor = "#E6E6E6"|In actual fact, computer data is stored using <font color="#008000">bits</font>, not bytes. A bit is the smallest unit that a computer can deal with, however all modern file systems treat a byte as being the smallest unit as a byte is capable of storing relatively useful information. It is impossible to store a single bit in a modern file system - the best that can be done is to store a single byte that has the same value as the bit. | |||
|- | |||
|} | |||
=== | ===Bits=== | ||
When we talk about the basic structure of a file, we typically think in terms of bytes. However, at its absolute simplest, the actual underlying file structure is a sequence of bits or binary values. We don<nowiki>’</nowiki>t usually deal with this level of representation because binary values don<nowiki>’</nowiki>t have the ability to represent anything meaningful. However, when grouped into sets of 8 bits, the range of information that can be stored becomes satisfactory. | |||
== | A bit, or binary value, is the language of a computer, and thus the underlying structure of everything readable by a computer. A bit only has 2 possible values – <font color="#800000">1</font> or <font color="#800000">0</font> – thus it is obvious why they are limited in what they represent. | ||
=== | {|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | ||
|align = "justify" bgcolor = "#E6E6E6"|The 2 possible values of a bit, <font color="#800000">0</font> and <font color="#800000">1</font>, are also commonly referred to as being either <font color="#008000">false</font> or <font color="#008000">true</font> (respectively). It can therefore be said that a bit is either a <font color="#008000">true-bit</font> or a <font color="#008000">false-bit</font>. | |||
|- | |||
|align = "justify" bgcolor = "#E6E6E6"| | |||
|- | |||
|align = "justify" bgcolor = "#E6E6E6"|Sometimes, although less common, a bit with value <font color="#800000">0</font> is referred to as being <font color="#008000">disabled</font>, and value <font color="#800000">1</font> is <font color="#008000">enabled</font>. This can sometimes help in user understanding, depending on the context of the discussion. | |||
|- | |||
|} | |||
A byte, the fundamental building block of files, is constructed using a group of 8 bits. The combination of 8 bits allow a byte to hold any value between 0 and 255, much more than the 2 possible combinations available to a single bit. | |||
So how do the grouped bits represent a larger numerical value such as that of a byte? This is achieved quite easily by referring to each of the 8 bits as an increasing power of 2. | |||
== | If we take a look at a single bit, we can think of it as having either the value <font color="#800000">1</font>x2<sup>0</sup> or <font color="#800000">0</font>x2<sup>0</sup> – thus giving us the values 1 or 0 respectively. If we add a bit to the left, the power of the new bit is either<font color="#800000"> 1</font>x2<sup>1</sup> or <font color="#800000">0</font>x2<sup>1</sup> – either 2 or 0. By adding the values of these 2 bits together, you should be able to see that all possible combinations will give us the values 0, 1, 2, and 3, as shown in the table below: | ||
=== | {|border="2" cellspacing="0" cellpadding="4" width="92%" | ||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Bit 1 (2<sup>1</sup>)</font> | |||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Bit 0 (2<sup>0</sup>)</font> | |||
|align = "center" bgcolor = "#E6E6E6"|<font color="#000080">Value</font> | |||
|- | |||
|align = "center"|0 | |||
|align = "center"|0 | |||
|align = "center"|0 (<font color="#800000">0</font>x2<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>0</sup>) | |||
|- | |||
|align = "center"|0 | |||
|align = "center"|1 | |||
|align = "center"|1 (<font color="#800000">0</font>x2<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>0</sup>) | |||
|- | |||
|align = "center"|1 | |||
|align = "center"|0 | |||
|align = "center"|2 (<font color="#800000">1</font>x2<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>0</sup>) | |||
|- | |||
|align = "center"|1 | |||
|align = "center"|1 | |||
|align = "center"|3 (<font color="#800000">1</font>x2<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>0</sup>) | |||
|- | |||
|} | |||
=== | If we continue this pattern for the remaining 6 bits, our highest bit will provide the power 2<sup>7</sup>. If all our 8 bits are <font color="#008000">''enabled''</font>, we would end up with the number 255 (<font color="#800000">1</font>x2<sup>7</sup> <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>6</sup> <nowiki>+</nowiki> … <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>0</sup>). <font color="#800080"><u>Appendix 1</u></font> provides a list of all possible byte values, and their bit value. | ||
=== | ===Bytes=== | ||
As described above, a byte is comprised of 8 bits, and can thus contain a value between 0 and 255. Bytes are the smallest unit that a modern file system can deal with, so for the majority of file format analysis you will only need to look at the byte level. | |||
All files are stored and accessed using bytes. When you open a file in a program or game, the bytes of the file are interpreted according to the logic of the application. For example, a word processor treats all bytes as being letters or numbers, whereas a hex editor displays bytes as hex codes. Hex codes will be discussed in a later chapter. | |||
=== External Directory Archives === | ===-bit (2-byte) numbers=== | ||
From this point forward, we need to careful when referring to a particular data types. Why? Because as computers and programming languages have evolved, the terminology has changed and confusion can arise. Therefore we will primarily refer to each data type as the number of bits or bytes that comprise it. | |||
We will also briefly introduce the terms for each group of programming languages, so you will be able to program with them. | |||
A 16-bit value is commonly known in <font color="#008000">''older programming languages''</font> as a word or an Integer. <font color="#008000">''Newer programming languages''</font> call it a Short. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|The term <font color="#008000">older programming languages</font> refers to the language <font color="#FF6600">C<nowiki>++</nowiki></font>, and any language that was derived before this time, such as <font color="#FF6600">C</font>, <font color="#FF6600">Visual Basic</font> (1.0 - 6.0), <font color="#FF6600">ASP</font>, <font color="#FF6600">Perl</font>, <font color="#FF6600">Pascal</font>, etc. | |||
|- | |||
|align = "justify" bgcolor = "#E6E6E6"| | |||
|- | |||
|align = "justify" bgcolor = "#E6E6E6"|The term <font color="#008000">newer programming languages</font> refers to languages derived after C<nowiki>++</nowiki>, such as <font color="#FF6600">Java</font>, <font color="#FF6600">Python</font>, <font color="#FF6600">Delphi</font>, and the <font color="#FF6600">.Net</font> languages (<font color="#FF6600">C<nowiki> </nowiki></font>, <font color="#FF6600">VB.net</font>, <font color="#FF6600">ASP.net</font>, <font color="#FF6600">J<nowiki> </nowiki></font>) | |||
|- | |||
|} | |||
A 16-bit number is just as the name suggests, a number created by 16 bits in a row. To determine the value of the 16-bit number, we follow the same process as when we wanted to get the value of a byte. | |||
Each of the 16 bits that make up the 16-bit number represents a power of 2 – the leftmost bit represents 2<sup>15</sup> and the rightmost bit 2<sup>0</sup>. Just as with bytes, we just go through each bit and calculate the <font color="#800000">''bitvalue''</font> x <font color="#800000">''power''</font>. | |||
An example – lets say we have the following 16 bits… | |||
101111000001100 | |||
Working from left to right, we get the value… | |||
<font color="#800000">1</font>x2<sup>15 </sup><nowiki>+</nowiki><font color="#800000"> 0</font>x2<sup>14 </sup><nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>13 </sup><nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>12 </sup><nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>11 </sup><nowiki>+</nowiki> … <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>2</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>1 </sup><nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>0</sup> | |||
If you work this out, you should end up with the number 24076. | |||
If all 16 bits had the value 1, you would end up with the number 65535 – therefore the value of a 16-bit number ranges between 0 and 65535. | |||
===-bit (4-byte) numbers=== | |||
A 32-bit number follows the same principles as a 16-bit number, with the exception that there are now 32 bits that represent the value. Therefore, the <font color="#008000">''highest''</font> bit has a value 2<sup>31</sup> and the <font color="#008000">''lowest''</font> bit has value 2<sup>0</sup>. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|The bit with the highest power value is known as the <font color="#008000">high-order bit</font>, and in the same vein, the bit with the lowest power value is the <font color="#008000">low-order bit</font>. When a file or computer system uses <font color="#008000">little-endian formatting</font>, which is most often the case, the high-order bit is to the left and the low-order bit to the right. In <font color="#008000">big-endian formatting</font>, this is not the case - more information about endian ordering will be presented in a later chapter. | |||
|- | |||
|} | |||
If all the bits for a 32-bit number were enabled, we would have the value 4,294,967,295, thus the range of values for a 32-bit number is 0 to 4,294,967,295. | |||
A 32-bit number in older programming languages is known as a <font color="#008000">''dword''</font> or a Long. In newer languages, this is known as an Integer. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#008000">Dword</font> is an abbreviation for <font color="#008000">Double Word</font>, meaning that a dword has double the number of bits that a word has (ie 32 = 2x16). | |||
|- | |||
|} | |||
===-bit (8-byte) numbers=== | |||
As with 32-bit numbers, 64-bit numbers can be calculated with the highest bit value 2<sup>63</sup> and lowest 2<sup>0</sup>. Thus, the range 0 to a massive 18,446,744,073,709,551,615. Due to the extreme size of this number, it is not expected that we will ever need to define a larger term. | |||
64-bit numbers are not supported by some of the older programming languages - those that do call it a <font color="#008000">''qword''</font>. All newer programming languages refer to this data type as a Long. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|64-bit numbers are relatively new concepts in the computer world, brought on by the ever-increasing size of hard drives, and technologies such as <font color="#FF6600">DVD</font>. Old file systems such as <font color="#008000">FAT-32</font> (used by <font color="#FF6600">Windows 95</font> and <font color="#FF6600">Windows 98</font>) were, as the name suggests, built around 32-bit numbers, but this inherently caused a problem with large files. Because a 32-bit number has a maximum value of 4,294,967,295, it meant that files that were larger than 4.3GB are not possible. Furthermore, a hard drive could not contain more than 4.3GB of file data. Due to this problem, 64-bit numbers were introduced, which allows for practically infinite amounts of storage space. 64-bit numbers are used in more modern file systems (<font color="#008000">NTFS</font> for <font color="#FF6600">Windows XP</font>), and for technologies like <font color="#FF6600">DVD</font> that have large storage space. | |||
|- | |||
|align = "justify" bgcolor = "#E6E6E6"| | |||
|- | |||
|align = "justify" bgcolor = "#E6E6E6"|A similar situation occurred during the transition from <font color="#FF6600">Windows 3.1</font> to <font color="#FF6600">Windows 95</font>, where computer systems that were originally built on the <font color="#008000">FAT-16</font> 16-bit file system were upgraded to <font color="#008000">FAT-32</font>. | |||
|- | |||
|} | |||
===Strings=== | |||
One of the most common tasks performed on a computer is word processing, so naturally we need some way of representing text in a document. A piece of text in a document is called a String, which more formally means a sequence of <font color="#008000">''characters''</font>. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|You need to be careful when using the term <font color="#008000">character</font>, as it can be different depending on the programming language, and indeed depending on the language of your country. A character in older programming languages is usually the same as a <font color="#008000">byte</font>, whereas in newer languages it is often the same as a <font color="#008000">16-bit short</font>. If the game or file was developed in a primarily English-speaking country (as most are), characters will usually be bytes regardless of the programming language used to write the game. Games from non-English speaking countries will usually be 16-bit shorts. | |||
|- | |||
|} | |||
Although there are many languages in the world, the first Latin language used in the Western world is English. The English script consists of 52 letters (upper and lower case), 10 numbers, and about 30 symbols. Seeing as though this adds up to about 92, it seems quite logical that we can represent each character as a different byte value (remembering that a byte supports up to 255 different numbers). This is exactly what happens when you open a text document in a word processor – the word processor reads the bytes of the file and represents each byte value as a character. | |||
For example, when the word processor reads a byte with value 65, it displays the letter <nowiki>’</nowiki>A<nowiki>’</nowiki>. The byte value 100 represents the letter <nowiki>’</nowiki>d<nowiki>’</nowiki>. Therefore, you can open any file in a word processor and it will be displayed as characters, regardless of whether it is a text document or not - the word processor simply doesn<nowiki>’</nowiki>t know that it isn<nowiki>’</nowiki>t a text file. The representation of a byte as a character is defined as <font color="#008000">''ASCII''</font>, for which the character associations are listed in <font color="#800080"><u>Appendix 2</u></font>. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#008000">ASCII</font> stands for <font color="#008000">American Standard Code for Information Interchange</font>, which was originally defined as a 7-bit character system (as all letters and numbers account for less than 128 values). As computer systems evolved, bytes became the standard unit of the computer, and as such the ASCII standard was adjusted into a full 8-bit character system. The original letters remained the same, with the 8th bit having the value 0. The newly-created 128 characters (the ones with the 8th bit equal to 1) were assigned to additional common characters such as letters with accents, foreign currency symbols, and other miscellaneous symbols like fractions and degrees. | |||
|- | |||
|} | |||
To expand the computing world into other languages, it became apparent that there are hundreds more letters and symbols - much more than the originally-defined 256. Therefore, an alternate character scheme called Unicode was created, which uses 2 bytes to represent each character rather than the usual 1 byte. To accommodate the original ASCII coding scheme, the value for each ASCII character is the same as the value for the first byte in each Unicode character, with the second bytes having the value 0. | |||
It is usually easy to determine whether a string is ASCII or Unicode. ASCII strings are easy to read in a hex editor, whereas the same English string represented as Unicode has a null byte between each letter (the second byte of each Unicode character.) | |||
Here is an example string represented as ASCII and Unicode. Note that the null bytes are represented with a <font color="#800000">'''.'''</font> symbol, as is common in many hex editors. | |||
{|border="0" cellspacing="2" width="100%" | |||
|align = "justify"|<font color="#000080">Original: | |||
|align = "justify"|</font>When I run fast, my legs get tired. | |||
|- | |||
|align = "justify"|<font color="#000080">ASCII: | |||
|align = "justify"|</font>When I run fast, my legs get tired. | |||
|- | |||
|align = "justify"|<font color="#000080">Unicode: | |||
|align = "justify"|</font>W<font color="#800000">'''.'''</font>h<font color="#800000">'''.'''</font>e<font color="#800000">'''.'''</font>n<font color="#800000">'''.'''</font> <font color="#800000">'''.'''</font>I<font color="#800000">'''.'''</font> <font color="#800000">'''.'''</font>r<font color="#800000">'''.'''</font>u<font color="#800000">'''.'''</font>n<font color="#800000">'''.'''</font> <font color="#800000">'''.'''</font>f<font color="#800000">'''.'''</font>a<font color="#800000">'''.'''</font>s<font color="#800000">'''.'''</font>t<font color="#800000">'''.'''</font>,<font color="#800000">''' .'''</font> <font color="#800000">'''.'''</font>m<font color="#800000">'''.'''</font>y<font color="#800000">'''.'''</font> <font color="#800000">'''.'''</font>l<font color="#800000">'''.'''</font>e<font color="#800000">'''.'''</font>g<font color="#800000">'''.'''</font>s<font color="#800000">'''.'''</font> <font color="#800000">'''.'''</font>g<font color="#800000">'''.'''</font>e<font color="#800000">'''.'''</font>t<font color="#800000">'''.'''</font> <font color="#800000">'''.'''</font>t<font color="#800000">'''.'''</font>i<font color="#800000">'''.'''</font>r<font color="#800000">'''.'''</font>e<font color="#800000">'''.'''</font>d<font color="#800000">'''.'''</font>.<font color="#800000">''' .'''</font> | |||
|- | |||
|} | |||
As you can see, the ASCII string appears the same as it would in a word processor, whereas the Unicode string consumes 2 bytes and thus seems padded out. Note that every character in the Unicode string has 2 bytes, including the spaces and commas. | |||
===Hexadecimal Numbering=== | |||
Hexadecimal numbering is an alternate way to represent the byte values 0-255. In traditional numbering, you need 1-3 characters to display the possible values for a byte. For example, the number 5 requires 1 character, whereas the number 113 requires 3 characters. Hexadecimal numbering was introduced to represent all byte values using exactly 2 characters, which means that bytes can easily be arranged into neat rows and columns. | |||
Recall that a byte contains a value between 0 and 255. To write any of these values in hexadecimal, we split it into 2 characters, each representing a power of 16. This is done in a similar way to binary numbers, where each character represents a power of 2. | |||
A problem arises: how do we represent 16 possible values in a single character. It is obvious that the values 0-9 can be represented as normal numbers, and for the values 10-15 we assign the letters A through F respectively. For example, the letter C in hexadecimal represents the value 12. | |||
So how do we write a number in power 16? As mentioned earlier, the byte value is split up into 2 characters, with the first number representing 16<sup>1</sup> and the second number representing 16<sup>0</sup>. You should notice that this is the same way bits are joined together to form a byte. | |||
The second number of the pair can take any value between 0 and 16 (labelled 0 through F), where the value represents <font color="#800000">''number''</font> x 16<sup>0</sup>''. ''So, if the second number was <font color="#800000">6</font>, it would represent the number <font color="#800000">6</font>x16<sup>0</sup> –the value 6. if the second number was <font color="#800000">B</font>, it would similarly represent the value <font color="#800000">11</font>x16<sup>0</sup> – the value 11. | |||
The first number of the pair represents the value <font color="#800000">''number''</font> x 16<sup>1</sup>. So, if the first number was <font color="#800000">2</font>, it would represent the value <font color="#800000">2</font>x16<sup>1</sup>'' ''– the value 32. | |||
Lets look at a full example now. If we are given the hexadecimal value <font color="#800000">1F</font>, what does it represent? The number <font color="#800000">1</font> means <font color="#800000">1</font>x16<sup>1</sup>, and the <font color="#800000">F</font> means <font color="#800000">15</font>x16<sup>0</sup>''.'' Added together, we get 16 <nowiki>+</nowiki> 15, the value 31. Similarly, the hexadecimal number <font color="#800000">E3</font> represents <font color="#800000">14</font>x16<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">3</font>x16<sup>0</sup>, the number 227. | |||
It should be clear to you now that we can represent any byte (values 0 through 255) in the hexadecimal number system using the values 00 through FF. If we are writing a hexadecimal number in a document, we use the format <font color="#800000">&h<nowiki># </nowiki></font>. For example, <font color="#800000">&hE3</font> means <font color="#800000">E3</font> in the hexadecimal coding scheme. | |||
===Signed and Unsigned Numbers=== | |||
Hopefully by now you can clearly see how numbers are stored in files, and even how strings are stored, but what about negative numbers? Luckily, negative numbers are really easy. | |||
There are only 2 possible types of numbers – either positive or negative. This maps perfectly with a single bit of value 0 or 1 respectively. | |||
Rather than add an extra bit to a number, we take the bit with the highest value and interpret it as a positive or negative sign. In an 8-bit number, for example, you would count all the bits from 2<sup>0</sup> to 2<sup>6</sup>, and the value of the 2<sup>7</sup> bit will determine whether the value is positive or negative. | |||
You should note that because the highest bit is being used for another purpose (identifying positive/negative), it cannot be used as part of the number itself. This effectively cuts the possible values of the number in half. In our example, you would normally be able to have any value between 0 and 255, however with the negative bit we now have numbers between -127 and 128. As there is no such thing as - 0, the bit code 10000000 is given the value 128 | |||
Here we need to introduce a way of knowing whether a number will be positive-only, or a positive/negative number. We therefore use the term <font color="#008000">''signed''</font> to indicate that the highest bit is used as a sign, or the term <font color="#008000">''unsigned''</font> indicating the number is always positive. Therefore, if you are told a 16-bit number is unsigned, you will know the number ranges between 0 and 65535. However, if it was a signed 16-bit number, it would range between –32767 and 32768. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|The type of file usually determines whether the numbers are signed or unsigned. For example, <font color="#008000">archives</font> and <font color="#008000">images</font> are almost solely <font color="#008000">''unsigned''</font> values. <font color="#008000">3D</font>-related files are often <font color="#008000">''signed''</font>, as it is possible to have points in the negative as well as the positive plane. | |||
|- | |||
|} | |||
You should note that signed numbers are extremely rare for archives, and as such, you should assume all numbers used in archives and in this document are unsigned. | |||
===Big-Endian and Little-Endian=== | |||
If you paid close attention, you would have noticed that whenever we calculate a number, the bit with the highest value was always on the left, and the lowest value on the right. This is regarded as almost a standard today amongst PC users, however some files, programs, and computer systems decided it was better to read it the other way around (right-to-left instead of left-to-right). So once again, we need to define some terms so that people know what order we are talking about. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#008000">Little-Endian</font> order is the one we will be using in this document, and unless stated specifically you should assume that Little-Endian order is used in any file. The alternate is <font color="#008000">Big-Endian</font> ordering. | |||
|- | |||
|} | |||
So lets see an example. Take the following stream of 8 bits | |||
10001110 | |||
If you have been following the document so far, you would quickly calculate the value of this 8-bit number as being | |||
<font color="#800000">1</font>x2<sup>7</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>6</sup> <nowiki>+</nowiki> … <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>1</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>0</sup> = 142 | |||
This is an example of Little-Endian ordering. However, in Big-Endian ordering we need to read the number in the opposite direction | |||
<font color="#800000">1</font>x2<sup>0</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>1</sup> <nowiki>+</nowiki> … <nowiki>+</nowiki> <font color="#800000">1</font>x2<sup>6</sup> <nowiki>+</nowiki> <font color="#800000">0</font>x2<sup>7</sup> = 113 | |||
It is always important to read the numbers in the correct order, otherwise you will end up with numbers that are meaningless and incorrect. As mentioned, if you don<nowiki>’</nowiki>t know which order to use, assume Little-Endian ordering - we will be using Little-Endian order for all examples in this document. | |||
===File Offsets=== | |||
One of the most fundamentals of format exploring is the concept of file offsets. A file offset is the position of a certain piece of data in a file, measured from the first byte of the file. However, as with most computer programming, we start our number counts at 0, not at 1. Therefore, if we are at the very beginning of the file, before we read anything, we are at offset 0. After we read 1 byte, we are at offset 1. Read another 6 bytes and we are at offset 7. | |||
If the concept is a little hard to grasp, think of an offset as being a bar that divides a file up byte-by-byte. If we are at the beginning of a file, offset 0, we have a bar right at the beginning before the first byte | |||
<font color="#800000">'''I'''</font>0110001011011000001011110 | |||
If we are at offset 3, we place the bar after the 3rd byte of the file, and before the 4th byte (ie. we have read 3 bytes) | |||
011<font color="#800000">'''I'''</font>0001011011000001011110 | |||
Similarly, offset 16 places the bar after byte 16, and before byte 17 | |||
0110001011011000<font color="#800000">'''I'''</font>001011110 | |||
==Archive Patterns== | |||
There are literally thousands of different archives out there, however most archives will conform to one of several basic patterns. Here we present the basic archive patterns so you can understand how the files are built, and how they can be read. Once you know the patterns, you can identify archive formats much faster. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|Note that the samples presented here list some fields like the <font color="#008000">number of files</font> and the <font color="#008000">header</font>. These fields are not fixed, and indeed may be totally different to the structure of your archive. You should use the samples presented here as a guide to the overall structure, not as a exact guide to a specific format. | |||
|- | |||
|} | |||
In these examples, the numerical value at the start of each field indicates the number of bytes used to contain the field value. For example, the line <font color="#000080">4 - Directory Offset</font> shows that there are 4 bytes used to store the directory offset. This field would thus be read as a 32-bit number, as described earlier. | |||
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%" | |||
|align = "justify" bgcolor = "#E6E6E6"|Note that most fields will either be <font color="#008000">2</font>, <font color="#008000">4</font>, or <font color="#008000">8</font> bytes in length, corresponding to the data types presented earlier (<font color="#008000">16-bit</font>, <font color="#008000">32-bit</font>, and <font color="#008000">64-bit</font> respectively). The main exception is the filename field, which naturally could be any arbitrary length. | |||
|- | |||
|} | |||
===Directory Archives=== | |||
Directory Archives are by far the most common structure in use today. As the name suggests, these archives store a directory that lists details about all the files, such as their name, offset and length. These archives are usually simple and very easy to read. | |||
The directory can be stored anywhere in the archive, however it is typically close to the beginning or the end. If the directory is not at the start of the archive, there will typically be a field that tells the offset to the directory, so that you can find it easily. This field is called the <font color="#008000">''directory offset''</font>, and is usually found in the header or at the very end of the archive. | |||
Here is a sample graphic representation of this archive structure: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"|4 - Header Tag (String)<br>4 - Number of Files | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|} | |||
===Split Directory Archives=== | |||
Split Directory Archives are similar in structure to the Directory Archive, with the main difference that there are multiple separate directories rather than a single collective directory. The way the directories are split is totally up to the individual - so here we will present 2 of the more common split types. | |||
This first example is a split directory, where the first directory contains the offsets and lengths, and the second directory contains the filenames: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"|4 - Header Tag (String)<br>4 - Number of Files<br>4 - <font color="#000080">''Files Directory''</font> Offset<br>4 - <font color="#000080">''Filenames Directory''</font> Offset | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Files Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Filenames Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Filename | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|} | |||
This second example is a split directory, where the first directory contains the offsets, the second directory contains the lengths, and the third directory contains the filenames: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"|4 - Header Tag (String)<br>4 - Number of Files<br>4 - <font color="#000080">''Offsets Directory''</font> Offset<br>4 - <font color="#000080">''Lengths Directory''</font> Offset<br>4 - <font color="#000080">''Filenames Directory''</font> Offset | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Offsets Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Lengths Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Size | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Filenames Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Filename | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|} | |||
===External Directory Archives=== | |||
External Directory Archives have the same structure as the Directory Archive, however the directory data and the file data are stored in 2 separate files. Naturally, the file that contains the file data is very large, and the directory file very small. | |||
Note that the 2 files both have the same name, but different extensions. The extensions of the files can be anything, however some common extensions for the directory are <font color="#800000"><nowiki>*</nowiki>.dir</font>, <font color="#800000"><nowiki>*</nowiki>.fat</font>, and <font color="#800000"><nowiki>*</nowiki>.idx</font>. | |||
Here is a sample graphic representation of this archive type, where the <font color="#008000">''Example.dir'' </font>file contains the directory information, and the <font color="#008000">''Example.dat'' </font> file contains the file data: | |||
<font color="#008000">''Example.dir''</font> | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"|4 - Number of Files | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Directory</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify" colspan = "3"|<br><br><font color="#008000">''Example.dat''</font> | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|} | |||
===Chunked Archives=== | |||
Chunked Archives are a simple structure where the files are stored one after the other. Each file has its own header that gives information about the file, particularly the <font color="#008000">''file size''</font>. These archives, probably the simplest of all the archive types, are examined by reading the header of the file, skipping the file data, then repeating again for the remaining files until you reach the end of the archive. | |||
One thing to note: these archives typically don<nowiki>’</nowiki>t store filenames, rather they store a 4-byte String that can be treated like the files<nowiki>’</nowiki> extension. | |||
Here is a example of this archive type: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "4"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"|4 - Header Tag (String) | |||
|- | |||
|align = "justify" colspan = "4"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "4"|<font color="#000080">Chunks</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">File 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Header 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Type (String)<br>4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">File 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Header 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Type (String)<br>4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">File ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Header ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Type (String)<br>4 - File Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|} | |||
===Split Chunk Archives=== | |||
Split Chunk Archives have the same basic structure of the Chunked Archives, however each file is also split up into chunks. Each file chunk is usually the same size (except for the last chunk in each file), which allows efficient use of buffers when reading the file. | |||
Here is a sample graphic representation of this archive type: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "4"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"|4 - Header Tag (String) | |||
|- | |||
|align = "justify" colspan = "4"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "4"|<font color="#000080">Chunks</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">File 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Header 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Type (String)<br>4 - File Size<br>4 - Number Of Chunks<br>4 - Chunk Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data Chunk 1<br>X - File Data Chunk 2<br>…<br>X - File Data Chunk ''n'' | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">File 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Header 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Type (String)<br>4 - File Size<br>4 - Number Of Chunks<br>4 - Chunk Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data Chunk 1<br>X - File Data Chunk 2<br>…<br>X - File Data Chunk ''n'' | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "3"|<font color="#008000">File ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Header ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Type (String)<br>4 - File Size<br>4 - Number Of Chunks<br>4 - Chunk Size | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data Chunk 1<br>X - File Data Chunk 2<br>…<br>X - File Data Chunk ''n'' | |||
|- | |||
|} | |||
===Tree Archives=== | |||
Tree Archives are the most complicated of the archive types, and thankfully they are not used very often. The idea is that the archive tries to store a complete directory tree structure, such as the individual folders. This is usually done by creating a directory for each folder, and linking them together, as you will see in the example. | |||
Here is a sample graphic representation of this archive type, however there can be many variations: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"|4 - Header Tag (String)<br>4 - Number of Folders at the root<br>4 - Total Number of Files | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Folder Entries</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders in this folder<br>4 - Offset to the first Sub-folder entry for this folder<br>4 - Number of Files in this folder<br>4 - Offset to the first file entry for this folder | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders in this folder<br>4 - Offset to the first Sub-folder entry for this folder<br>4 - Number of Files in this folder<br>4 - Offset to the first file entry for this folder | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders in this folder<br>4 - Offset to the first Sub-folder entry for this folder<br>4 - Number of Files in this folder<br>4 - Offset to the first file entry for this folder | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Entries</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data | |||
|- | |||
|} | |||
As this archive type is quite difficult to explain, I will provide an example here. Lets pretend that our archive contains 3 files, as specified below: | |||
<font color="#FF6600">\data\sounds\snd1.wav | |||
\data\sounds\snd2.wav | |||
\data\images\temp\pic1.bmp | |||
</font> | |||
The following diagram shows the structure of the archive that contains these 3 files (with the values of each field shown in <font color="#008000">''green''</font>) | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"|4 - Header Tag (String) <font color="#008000">HEAD</font><br>4 - Number of Folder at the root 1<br>4 - Total Number of Files 3 | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">Folder Entries</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name <font color="#008000">data</font><br>4 - Number of Sub-folders in this folder 2<br>4 - Offset to first Sub-folder offset to Folder Entry 2<br>4 - Number of Files in this folder 0<br>4 - Offset to first file entry 0 | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name <font color="#008000">sounds</font><br>4 - Number of Sub-folders in this folder 0<br>4 - Offset to first Sub-folder 0<br>4 - Number of Files in this folder 2<br>4 - Offset to first file entry offset to File Entry 1 | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry 3</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name <font color="#008000">images</font><br>4 - Number of Sub-folders in this folder 1<br>4 - Offset to first Sub-folder offset to Folder Entry 4<br>4 - Number of Files in this folder 0<br>4 - Offset to first file entry 0 | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">Folder Entry 4</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name <font color="#008000">temp</font><br>4 - Number of Sub-folders in this folder 0<br>4 - Offset to first Sub-folder 0<br>4 - Number of Files in this folder 1<br>4 - Offset to first file entry offset to File Entry 3 | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Entries</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset <font color="#008000">offset to File Data 1</font><br>4 - File Size length of File Data 1<br>X - Filename snd1.wav | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset <font color="#008000">offset to File Data 2</font><br>4 - File Size length of File Data 2<br>X - Filename snd2.wav | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Entry 3</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset <font color="#008000">offset to File Data 3</font><br>4 - File Size length of File Data 3<br>X - Filename pic1.bmp | |||
|- | |||
|align = "justify" colspan = "3"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "3"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data <font color="#008000">the data for file snd1.wav</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data <font color="#008000">the data for file snd2.wav</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "2"|<font color="#800000">File Data 3</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - File Data <font color="#008000">the data for file pic1.bmp</font> | |||
|- | |||
|} | |||
Lets walk through the reading of this file. The color ''orange'' indicates the name of a field in the example. The other colors are the same as the colors in the example. | |||
First we read the Archive Header and see that there is only 1 folder at the root. This lets us know that we now need to read a single folder entry. | |||
We read Folder Entry 1, called temp, and are told there are 2 sub-folders. The sub-folders start at a certain offset in the archive. | |||
So we skip to the offset of the sub-folders. For each of the 2 sub-folders, we need to read a folder entry. The first folder entry read is called sounds (Folder Entry 2) and there are 2 files in it. The second entry is called images (Folder Entry 3) and there is 1 sub-folder in it. | |||
So we jump to the offset for the first file entry for the sounds folder,'' ''and read 2 file entries, namely snd1.wav (File Entry 1) and snd2.wav (File Entry 2). After we have read these, we jump back to where we were. We have finished with everything in the sounds folder, so we move on to the offset of the sub-folders for the images folder''.'' We read 1 folder entry, called temp (Folder Entry 4), which has 1 file in it. | |||
We jump forward to the offset for the first file entry and read 1 file entry, called pic1.bmp (File Entry 3). We know that the total number of files is 3, so now we have finished reading the tree. | |||
Using this method, we can build up a complex directory tree. This type of archive is usually slightly smaller in size than the plain directory archive, because the filenames don<nowiki>’</nowiki>t have to repeat the entire folder string for each entry, however the compromise is that it takes longer to read because you are jumping all over the place. For this reason, and the fact that it is a very complex structure, only a few games use this type of structure. | |||
===Nested Tree Archives=== | |||
Nested Tree Archives, even though the name sounds hard, are a simpler version of the Tree Archive. The idea is the same as the Tree Archive: store a complete directory tree structure, however the Nested Tree Archive can be read more efficiently as there is no jumping around. | |||
Here is a sample graphic representation of this archive type. The diagram is a little tricky to follow, so read the description and psuedo-code following it for a clearer explanation, then try to read the diagram: | |||
{|border="2" cellspacing="0" cellpadding="4" width="90%" | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "7"|<font color="#000080">Archive Header</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" colspan = "6"|4 - Header Tag (String)<br>4 - Number of Folders at the root<br>4 - Total Number of Files | |||
|- | |||
|align = "justify" colspan = "7"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "7"|<font color="#000080">Entries</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "6"|<font color="#800000">Folder Entry 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "5"|X - Folder Name<br>4 - Number of Sub-folders<br>4 - Number of Files | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "5"|<font color="#008000">Sub-Folder Entries 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "4"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">Sub-Folder Entry 1a</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders<br>4 - Number of Files | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0"|<font color="#008000">Sub-Folder Entries 1.1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">Sub-Folder Entry 1.1a</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders<br>4 - Number of Files | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">Sub-Folder Entry 1.1''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders<br>4 - Number of Files | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0"|<font color="#008000">File Entries 1.1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">File Entry 1.1a</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "2"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">File Entry 1.1''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "4"| | |||
|align = "justify" bgcolor = "#E0E0E0"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "4"| | |||
|align = "justify" bgcolor = "#E0E0E0"|<font color="#800000">Sub-Folder Entry 1''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"| | |||
|align = "justify"| | |||
|align = "justify"|X - Folder Name<br>4 - Number of Sub-folders<br>4 - Number of Files | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "5"|<font color="#008000">File Entries 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "4"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">File Entry 1a</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "4"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "4"| | |||
|align = "justify" bgcolor = "#E6E6E6"|<font color="#800000">File Entry 1''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "3"| | |||
|align = "justify"| | |||
|align = "justify"|4 - File Offset<br>4 - File Size<br>X - Filename | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "6"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "6"|<font color="#800000">Folder Entry ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "5"|X - Folder Name<br>4 - Number of Sub-folders<br>4 - Number of Files | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "5"|<font color="#008000">Sub-Folder Entries ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E0E0E0" colspan = "5"|<font color="#008000">File Entries ''n''</font> | |||
|- | |||
|align = "justify" colspan = "7"| | |||
|- | |||
|align = "justify" bgcolor = "#D9D9D9" colspan = "7"|<font color="#000080">File Data</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "6"|<font color="#800000">File Data 1</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "5"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "6"|<font color="#800000">File Data 2</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "5"|X - File Data | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "6"|<font color="#800000">…</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify" bgcolor = "#E6E6E6" colspan = "6"|<font color="#800000">File Data ''n''</font> | |||
|- | |||
|align = "justify"| | |||
|align = "justify"| | |||
|align = "justify" colspan = "5"|X - File Data | |||
|- | |||
|} | |||
The diagram may seem difficult, but that is mostly due to the fact that it is nested, meaning that it can have as many directories-inside-directories as you like. | |||
This is the way it works. You first read the Archive Header, and see how many folders at the root there are. Usually there will only be 1 folder at the root. | |||
So you read 1 Folder Entry, and find out the number of sub-folders, and the number of files. For every sub-folder, you repeat again from Folder Entry 1. When you have read all the sub-folders, you then read all the File Entries. | |||
If you can read psuedo-code, here is the kind of thing I am trying to describe: | |||
'''method '''<font color="#000080">'''readArchive'''</font>'''()<nowiki>{</nowiki>''' | |||
''' read ('''<font color="#800000">'''FolderEntry'''</font>''');''' | |||
''' for each ('''<font color="#008000">'''sub-folder'''</font>''')<nowiki>{</nowiki>''' | |||
''' <font color="#000080">readArchive'''</font>'''();''' | |||
''' <nowiki>}</nowiki>''' | |||
''' for each ('''<font color="#008000">'''file'''</font>''')<nowiki>{</nowiki>''' | |||
''' read ('''<font color="#800000">'''FileEntry'''</font>''');''' | |||
''' <nowiki>}</nowiki>''' | |||
''' <nowiki>}</nowiki>''' | |||
So, you begin by reading a <font color="#800000">FolderEntry</font>. If the entry has <font color="#008000">sub-folders</font> in it, you must immediately read the entries for those sub-folders, by repeating the process <font color="#000080">from the beginning</font>. When all the sub-folder entries for this <font color="#800000">FolderEntry</font> have been read, you can then progress and read the <font color="#800000">FileEntries</font> for the folder. | |||
== Checking Your Results == | == Checking Your Results == | ||
Revision as of 12:18, 17 October 2006
This document explains in detail how to start exploring and examining file formats, with a focus on Game Resource Archives. For beginners and advanced users alike.
The definitive word in archive exploration.
Download below, or scroll on down and read it here:
Authors: Mr.Mouse and Watto
Version: 1.0 as of November 2004
Rewritten for the WIKI by Dinoguy1000 as of August 2006
Title page
| THE DEFINITIVE GUIDE TO |
| EXPLORING FILE FORMATS |
= Revision 2 =
WATTO
(www.watto.org)
Mike Zuurman
(www.xentax.com)
Table of Contents
Introduction
General Introduction
Computer games are vast and many, covering a wide range of genres and game styles, but there is one fundamental feature that all games require - resources. Every game has a range of resources that help make it unique - from texture images to audio soundtracks. With all these resources, there needs to be a way they can be stored so that games can use them, and the way this is typically done is to store them in a big archive file.
| An archive is a single computer file that contains the data for several smaller files. A common analogy would be a cardboard box - it can be used to store a lot of different items (paper, food, objects), and each item can have different properties (size, color, shape) |
The question that may arise is "why do game developers use archives to store their game resources? Wouldn’t it be easier to just store all the files normally?" The answer is yes, storing the files normally would be much easier, and certainly much better during the game development, but before the final production they are packaged into archives for several reasons…
- An archive can store a lot of files in a single location, so it is quicker to access the files from a hard disk or CD
- A large archive, due to it being in 1 block on the disk, can utilise features such as file buffers, further increasing read performance
- It reduces the number of files on the disk, making the reading of the file index quicker
- The files can be hidden away, making it harder to hack or modify the game
- All files can be accessed using a single file stream, reducing the time required to generate file stream objects, and making the file access programming simpler
- Files can be compressed easily, and other information such as file descriptions and ID numbers can be stored
Purpose Of This Book
Unfortunately, there is a downside to using archives - there are no real standards defined for the creation and use of archives. In order to read or write archives for a particular game, someone usually needs to analyse the file themselves, or perform other complicated and time-consuming tasks such as reverse engineering or hex editing.
Some of the more modern games produced these days recognise that they can gain extra advertising by allowing the internet community to mod their games. Due to this, some game developers have changed to supporting standard archive types, such as Zip archives, however there is still an overwhelming number of games with their own proprietary archive formats.
| Mod, short for modification, refers to the alteration of a computer game by a member of the internet community, usually to support extra functionality or to generate a different game built on top of the original. Some examples include changing the sounds and textures used by a game, or creating new game maps. |
This book aims to provide an insight into the way game archives are created, and how to analyse an archive to locate the files contained within. In the following pages, we will discuss some of the basic fundamentals of computer-stored numbering, common structures used by most archives, compression, encryption, and the tools that you can use to help get the job done. Hopefully, by the time you have finished reading this book, you will be able to analyse your own archives, and take the first step towards your own development and game modding.
Thanks for reading our book, we wish you the best of luck in your exploration ïÂÅ .
Formatting Used In This Book
| Link | A link to a website of interest or for further information. |
| Link | A link to a different section of the document. |
| Term | An important term, or a term that is being defined. A general comment, or clarification of a point. |
| Value | A value, usually in an example |
| Caption | Caption for an image, or a reference to some information in the image |
| Reference | A tool reference, such as a menu, button, or action in a specific program. |
| Brief descriptions of a term, related notes, or other supplementary material will be presented in a box like this. This will often accompany a term. |
What is a GRAF?
The term GRAF describes the way a game archive is constructed, and in particular, the storage of the files within the archive. The format of an archive usually differs between each individual game, however occasionally a game developer will stick with a particular format for a few games of the same vintage, particularly if the games are built using the same underlying game engine.
| GRAF stands for Game Resource Archive Format, which is most simply the specifications describing the format of a particular archive. |
Programmers usually define their GRAFs according to the needs and structure of the game itself. For example, the memory in an XBOX game console is based around blocks of 2048 bytes - the GRAFs for most XBOX games utilise this so the game data can be opened efficiently.
The development of a GRAF is particularly troublesome - there is a constant weigh-up between factors such as efficient storage, quick loading, and fast targeting. One of the things that has great influence is human readability - the things that make archives easy for humans to use, often make it less efficient. For example, the storing of filenames in an archive tells humans the purpose and type of data, however it is very inefficient and slow to read filenames from an archive - thus the weigh-up.
| Efficient storage: Files need to be stored in a way that conserves space on the disk and/or in memory. |
| Quick Loading: When the game is loading, the required resources are loaded into memory - this needs to be done quickly, while still gathering all the required information. |
| Fast Targeting: When a resource is loaded into memory, it needs to be quick and easy for the game to find the file. This is usually a big weigh-up between human readability (filenames) vs. computer efficiency (hash fields and trees). |
During the game development, the actual resources used by the game change frequently. To make it quick and easy to adapt the changes, the GRAF is usually structured following a common and recognisable pattern, some of which will be described in later chapters.
Tools of the Trade
Hex Editors
The generic hex editor is the main type of program used to view data in non-text files, such as archives. Similarly to the way word processors display text data, a hex editor displays the contents of a file using hex characters.
| Hex characters are an alternate way to represent the byte data in a file. Whereas word processors display byte values as letters, hex displays each byte as a 2-character code that represents all possible values 0-256 (00-FF). The way to read and construct hex values is discussed in a later chapter |
There are literally hundreds of hex editors available for use - the one that you choose is your own personal taste. All hex editors have the same basic functionality, but some provide other tools and features that make it quicker and easier to work with files. Most hex editors are freely available over the Internet.
Following, we will provide a brief introduction into our own preferred programs, so you can see the general style and features available to you. This list is personal preference only - we encourage you to actively seek out your own preferred programs.
Hex Workshop from Breakpoint Software will be used for the examples and screenshots in this book, however the processes and screens should be similar across all hex editors. Hex Workshop includes several handy functions for analysis work, such as:
- A hexadecimal calculator
- Lists of the data types at the current location in the file
- Bookmarking
- Colour mapping.
| Hex Workshop is available from http://www.bpsoft.com |
While we encourage you to try many different programs, the one you ultimately choose should be based on your needs.
Hex Workshop
Here we present a brief introduction into the use of Hex Workshop. Although this is the main program that will be used for the screenshots in this book, take note that almost everything in this program can be applied to other hex editors, including the interface structure and layout.
n[[Image:## Error Converting ##]] File:Guide To Exploring File Formats - 011 - 01.png Figure 3.1.1a: General layout of Hex Workshop
A. Hexadecimal representation of the file content
B. ASCII interpretation of the file content
C. Different representations of the data at current cursor position
D. User-assigned bookmarks and their descriptions
When you have installed Hex Workshop, a convenience link is added to the context menu of Windows Explorer. Just right-click on a file and select "Edit with Hex Workshop" to open the file in the program
| The context menu is the menu that appears when you right-click in a Windows program. Named due to the fact that the links in the menu depend on the context of the right-click. For example, right-clicking on a file will give different choices to right-clicking on a selected piece of text. |
Once you have opened a file, you will be presented with a view similar to that depicted in Figure 3.1.1a. You can examine the files hexadecimal interpretation in section A, or the ASCII interpretation of the same bytes in section B. The table to the far left shows the offset of the lines shown.
| An offset is the location of the file data in relation to the start of the file. For example, an offset of value 560 means there are 560 bytes of data before you reach the current location. |
In this example, we have opened one of the *.pk4 files from the game Doom 3. We will later see that these are actually generic *.zip files. For now, you can see the file starts with the characters PK. The characters at the beginning of a file are often referred to as a header, ID tag, or magic number - and are usually a reliable way to identify whether the file is a common type. For example, all *.zip archives have the characters PK at the beginning, therefore there is a strong probability that the archive in our example is a *.zip archive. A brief list of some common header tags can be found later in the book.
| A header tag is simply a small group of bytes at the start of a file that help to identify the format of the remaining data. The header tag is usually a 4-byte string, however it can also be a preset set of byte values. While it is true that a files’ extension can help determine a file format, it is often unreliable and can be easily changed, whereas a header tag is hard to alter and is usually unique. In reality, the best way to determine a files format is to use a combination of the file extension and the header tag. |
The current position of the cursor in our example is at offset 18. The Data Interpreter in section C shows the different interpretations of the data at this file position, ranging from numbers to strings. The different data interpretations are covered more completely in a later chapter.
In our example image, we have color mapped and bookmarked (as in section D) some areas of our interest. Any range of bytes can be bookmarked or color mapped - simply click and drag the cursor along your area of interest and select the appropriate option from the context menu. When you make a bookmark, you can choose the data interpretation of the selection (its value), and give a description. The bookmarks will be shown with their offset in the file and the length in bytes. This is a very useful feature, as it allows you to click on a bookmark to jump to that offset.
| Color mapping: assigns a color to the selected area, to make it stand out. |
| Bookmarking: records the current cursor location in section D, with a user-defined description. |
Hex Workshop has the ability to save the bookmarks and color maps, so that you can load them on another file and see if the pattern matches. ie, if you have solved the pattern of a GRAF, you can apply the bookmarks and colour mapping to other files that you expect to have the same format.
Hex Workshop has another handy function - GoTo. If you select a range of bytes in a file and choose GoTo from the context menu, you can jump to the location identified by the selected value.
Terms, Definitions, and Data Structures
To understand the patterns and construction of archives, we must first introduce the concept of data structures, and some of the fundamentals of computerized data.
Files
A computer file is a series of bytes stored one after the other which, when combined together, form a representation of a piece of data. If you have a file that is 12 bytes in size, it indicates that there are 12 single bytes of data that are used to represent the entire document.
| The term File stems from the original computer metaphor as an office replacement. As in a work office, files were organised into folders, where each folder contained a group of related files. |
File sizes start off at the preliminary byte, and change terms at every increment of 1024 bytes (although, for each of human use, most people refer to increments of 1000 bytes). The following table shows the increments of file size terms:
| Byte (B) | ||||
| Kilobyte (KB) | 1,024 (1 thousand bytes) |
|||
| Megabyte (MB) | 1,048,576 (1 million bytes) |
1,024 (1 thousand KB) |
||
| Gigabyte (GB) | 1,073,741,824 (1 billion bytes) |
1,048,576 (1 million KB) |
1,024 (1 thousand MB) |
|
| Terabyte (TB) | 1,099,511,627,776 (1 trillion bytes) |
1,073,741,824 (1 billion KB) |
1,048,576 (1 million MB) |
1,024 (1 thousand GB) |
| In actual fact, computer data is stored using bits, not bytes. A bit is the smallest unit that a computer can deal with, however all modern file systems treat a byte as being the smallest unit as a byte is capable of storing relatively useful information. It is impossible to store a single bit in a modern file system - the best that can be done is to store a single byte that has the same value as the bit. |
Bits
When we talk about the basic structure of a file, we typically think in terms of bytes. However, at its absolute simplest, the actual underlying file structure is a sequence of bits or binary values. We don’t usually deal with this level of representation because binary values don’t have the ability to represent anything meaningful. However, when grouped into sets of 8 bits, the range of information that can be stored becomes satisfactory.
A bit, or binary value, is the language of a computer, and thus the underlying structure of everything readable by a computer. A bit only has 2 possible values – 1 or 0 – thus it is obvious why they are limited in what they represent.
| The 2 possible values of a bit, 0 and 1, are also commonly referred to as being either false or true (respectively). It can therefore be said that a bit is either a true-bit or a false-bit. |
| Sometimes, although less common, a bit with value 0 is referred to as being disabled, and value 1 is enabled. This can sometimes help in user understanding, depending on the context of the discussion. |
A byte, the fundamental building block of files, is constructed using a group of 8 bits. The combination of 8 bits allow a byte to hold any value between 0 and 255, much more than the 2 possible combinations available to a single bit.
So how do the grouped bits represent a larger numerical value such as that of a byte? This is achieved quite easily by referring to each of the 8 bits as an increasing power of 2.
If we take a look at a single bit, we can think of it as having either the value 1x20 or 0x20 – thus giving us the values 1 or 0 respectively. If we add a bit to the left, the power of the new bit is either 1x21 or 0x21 – either 2 or 0. By adding the values of these 2 bits together, you should be able to see that all possible combinations will give us the values 0, 1, 2, and 3, as shown in the table below:
| Bit 1 (21) | Bit 0 (20) | Value |
| 0 | 0 | 0 (0x21 + 0x20) |
| 0 | 1 | 1 (0x21 + 1x20) |
| 1 | 0 | 2 (1x21 + 0x20) |
| 1 | 1 | 3 (1x21 + 1x20) |
If we continue this pattern for the remaining 6 bits, our highest bit will provide the power 27. If all our 8 bits are enabled, we would end up with the number 255 (1x27 + 1x26 + … + 1x21 + 1x20). Appendix 1 provides a list of all possible byte values, and their bit value.
Bytes
As described above, a byte is comprised of 8 bits, and can thus contain a value between 0 and 255. Bytes are the smallest unit that a modern file system can deal with, so for the majority of file format analysis you will only need to look at the byte level.
All files are stored and accessed using bytes. When you open a file in a program or game, the bytes of the file are interpreted according to the logic of the application. For example, a word processor treats all bytes as being letters or numbers, whereas a hex editor displays bytes as hex codes. Hex codes will be discussed in a later chapter.
-bit (2-byte) numbers
From this point forward, we need to careful when referring to a particular data types. Why? Because as computers and programming languages have evolved, the terminology has changed and confusion can arise. Therefore we will primarily refer to each data type as the number of bits or bytes that comprise it.
We will also briefly introduce the terms for each group of programming languages, so you will be able to program with them.
A 16-bit value is commonly known in older programming languages as a word or an Integer. Newer programming languages call it a Short.
| The term older programming languages refers to the language C++, and any language that was derived before this time, such as C, Visual Basic (1.0 - 6.0), ASP, Perl, Pascal, etc. |
| The term newer programming languages refers to languages derived after C++, such as Java, Python, Delphi, and the .Net languages (C , VB.net, ASP.net, J ) |
A 16-bit number is just as the name suggests, a number created by 16 bits in a row. To determine the value of the 16-bit number, we follow the same process as when we wanted to get the value of a byte.
Each of the 16 bits that make up the 16-bit number represents a power of 2 – the leftmost bit represents 215 and the rightmost bit 20. Just as with bytes, we just go through each bit and calculate the bitvalue x power.
An example – lets say we have the following 16 bits…
101111000001100
Working from left to right, we get the value…
1x215 + 0x214 + 1x213 + 1x212 + 1x211 + … + 1x22 + 0x21 + 0x20
If you work this out, you should end up with the number 24076.
If all 16 bits had the value 1, you would end up with the number 65535 – therefore the value of a 16-bit number ranges between 0 and 65535.
-bit (4-byte) numbers
A 32-bit number follows the same principles as a 16-bit number, with the exception that there are now 32 bits that represent the value. Therefore, the highest bit has a value 231 and the lowest bit has value 20.
| The bit with the highest power value is known as the high-order bit, and in the same vein, the bit with the lowest power value is the low-order bit. When a file or computer system uses little-endian formatting, which is most often the case, the high-order bit is to the left and the low-order bit to the right. In big-endian formatting, this is not the case - more information about endian ordering will be presented in a later chapter. |
If all the bits for a 32-bit number were enabled, we would have the value 4,294,967,295, thus the range of values for a 32-bit number is 0 to 4,294,967,295.
A 32-bit number in older programming languages is known as a dword or a Long. In newer languages, this is known as an Integer.
| Dword is an abbreviation for Double Word, meaning that a dword has double the number of bits that a word has (ie 32 = 2x16). |
-bit (8-byte) numbers
As with 32-bit numbers, 64-bit numbers can be calculated with the highest bit value 263 and lowest 20. Thus, the range 0 to a massive 18,446,744,073,709,551,615. Due to the extreme size of this number, it is not expected that we will ever need to define a larger term.
64-bit numbers are not supported by some of the older programming languages - those that do call it a qword. All newer programming languages refer to this data type as a Long.
| 64-bit numbers are relatively new concepts in the computer world, brought on by the ever-increasing size of hard drives, and technologies such as DVD. Old file systems such as FAT-32 (used by Windows 95 and Windows 98) were, as the name suggests, built around 32-bit numbers, but this inherently caused a problem with large files. Because a 32-bit number has a maximum value of 4,294,967,295, it meant that files that were larger than 4.3GB are not possible. Furthermore, a hard drive could not contain more than 4.3GB of file data. Due to this problem, 64-bit numbers were introduced, which allows for practically infinite amounts of storage space. 64-bit numbers are used in more modern file systems (NTFS for Windows XP), and for technologies like DVD that have large storage space. |
| A similar situation occurred during the transition from Windows 3.1 to Windows 95, where computer systems that were originally built on the FAT-16 16-bit file system were upgraded to FAT-32. |
Strings
One of the most common tasks performed on a computer is word processing, so naturally we need some way of representing text in a document. A piece of text in a document is called a String, which more formally means a sequence of characters.
| You need to be careful when using the term character, as it can be different depending on the programming language, and indeed depending on the language of your country. A character in older programming languages is usually the same as a byte, whereas in newer languages it is often the same as a 16-bit short. If the game or file was developed in a primarily English-speaking country (as most are), characters will usually be bytes regardless of the programming language used to write the game. Games from non-English speaking countries will usually be 16-bit shorts. |
Although there are many languages in the world, the first Latin language used in the Western world is English. The English script consists of 52 letters (upper and lower case), 10 numbers, and about 30 symbols. Seeing as though this adds up to about 92, it seems quite logical that we can represent each character as a different byte value (remembering that a byte supports up to 255 different numbers). This is exactly what happens when you open a text document in a word processor – the word processor reads the bytes of the file and represents each byte value as a character.
For example, when the word processor reads a byte with value 65, it displays the letter ’A’. The byte value 100 represents the letter ’d’. Therefore, you can open any file in a word processor and it will be displayed as characters, regardless of whether it is a text document or not - the word processor simply doesn’t know that it isn’t a text file. The representation of a byte as a character is defined as ASCII, for which the character associations are listed in Appendix 2.
| ASCII stands for American Standard Code for Information Interchange, which was originally defined as a 7-bit character system (as all letters and numbers account for less than 128 values). As computer systems evolved, bytes became the standard unit of the computer, and as such the ASCII standard was adjusted into a full 8-bit character system. The original letters remained the same, with the 8th bit having the value 0. The newly-created 128 characters (the ones with the 8th bit equal to 1) were assigned to additional common characters such as letters with accents, foreign currency symbols, and other miscellaneous symbols like fractions and degrees. |
To expand the computing world into other languages, it became apparent that there are hundreds more letters and symbols - much more than the originally-defined 256. Therefore, an alternate character scheme called Unicode was created, which uses 2 bytes to represent each character rather than the usual 1 byte. To accommodate the original ASCII coding scheme, the value for each ASCII character is the same as the value for the first byte in each Unicode character, with the second bytes having the value 0.
It is usually easy to determine whether a string is ASCII or Unicode. ASCII strings are easy to read in a hex editor, whereas the same English string represented as Unicode has a null byte between each letter (the second byte of each Unicode character.)
Here is an example string represented as ASCII and Unicode. Note that the null bytes are represented with a . symbol, as is common in many hex editors.
| Original: | When I run fast, my legs get tired. |
| ASCII: | When I run fast, my legs get tired. |
| Unicode: | W.h.e.n. .I. .r.u.n. .f.a.s.t., . .m.y. .l.e.g.s. .g.e.t. .t.i.r.e.d.. . |
As you can see, the ASCII string appears the same as it would in a word processor, whereas the Unicode string consumes 2 bytes and thus seems padded out. Note that every character in the Unicode string has 2 bytes, including the spaces and commas.
Hexadecimal Numbering
Hexadecimal numbering is an alternate way to represent the byte values 0-255. In traditional numbering, you need 1-3 characters to display the possible values for a byte. For example, the number 5 requires 1 character, whereas the number 113 requires 3 characters. Hexadecimal numbering was introduced to represent all byte values using exactly 2 characters, which means that bytes can easily be arranged into neat rows and columns.
Recall that a byte contains a value between 0 and 255. To write any of these values in hexadecimal, we split it into 2 characters, each representing a power of 16. This is done in a similar way to binary numbers, where each character represents a power of 2.
A problem arises: how do we represent 16 possible values in a single character. It is obvious that the values 0-9 can be represented as normal numbers, and for the values 10-15 we assign the letters A through F respectively. For example, the letter C in hexadecimal represents the value 12.
So how do we write a number in power 16? As mentioned earlier, the byte value is split up into 2 characters, with the first number representing 161 and the second number representing 160. You should notice that this is the same way bits are joined together to form a byte.
The second number of the pair can take any value between 0 and 16 (labelled 0 through F), where the value represents number x 160. So, if the second number was 6, it would represent the number 6x160 –the value 6. if the second number was B, it would similarly represent the value 11x160 – the value 11.
The first number of the pair represents the value number x 161. So, if the first number was 2, it would represent the value 2x161 – the value 32.
Lets look at a full example now. If we are given the hexadecimal value 1F, what does it represent? The number 1 means 1x161, and the F means 15x160. Added together, we get 16 + 15, the value 31. Similarly, the hexadecimal number E3 represents 14x161 + 3x160, the number 227.
It should be clear to you now that we can represent any byte (values 0 through 255) in the hexadecimal number system using the values 00 through FF. If we are writing a hexadecimal number in a document, we use the format &h# . For example, &hE3 means E3 in the hexadecimal coding scheme.
Signed and Unsigned Numbers
Hopefully by now you can clearly see how numbers are stored in files, and even how strings are stored, but what about negative numbers? Luckily, negative numbers are really easy.
There are only 2 possible types of numbers – either positive or negative. This maps perfectly with a single bit of value 0 or 1 respectively.
Rather than add an extra bit to a number, we take the bit with the highest value and interpret it as a positive or negative sign. In an 8-bit number, for example, you would count all the bits from 20 to 26, and the value of the 27 bit will determine whether the value is positive or negative.
You should note that because the highest bit is being used for another purpose (identifying positive/negative), it cannot be used as part of the number itself. This effectively cuts the possible values of the number in half. In our example, you would normally be able to have any value between 0 and 255, however with the negative bit we now have numbers between -127 and 128. As there is no such thing as - 0, the bit code 10000000 is given the value 128
Here we need to introduce a way of knowing whether a number will be positive-only, or a positive/negative number. We therefore use the term signed to indicate that the highest bit is used as a sign, or the term unsigned indicating the number is always positive. Therefore, if you are told a 16-bit number is unsigned, you will know the number ranges between 0 and 65535. However, if it was a signed 16-bit number, it would range between –32767 and 32768.
| The type of file usually determines whether the numbers are signed or unsigned. For example, archives and images are almost solely unsigned values. 3D-related files are often signed, as it is possible to have points in the negative as well as the positive plane. |
You should note that signed numbers are extremely rare for archives, and as such, you should assume all numbers used in archives and in this document are unsigned.
Big-Endian and Little-Endian
If you paid close attention, you would have noticed that whenever we calculate a number, the bit with the highest value was always on the left, and the lowest value on the right. This is regarded as almost a standard today amongst PC users, however some files, programs, and computer systems decided it was better to read it the other way around (right-to-left instead of left-to-right). So once again, we need to define some terms so that people know what order we are talking about.
| Little-Endian order is the one we will be using in this document, and unless stated specifically you should assume that Little-Endian order is used in any file. The alternate is Big-Endian ordering. |
So lets see an example. Take the following stream of 8 bits
10001110
If you have been following the document so far, you would quickly calculate the value of this 8-bit number as being
1x27 + 0x26 + … + 1x21 + 0x20 = 142
This is an example of Little-Endian ordering. However, in Big-Endian ordering we need to read the number in the opposite direction
1x20 + 0x21 + … + 1x26 + 0x27 = 113
It is always important to read the numbers in the correct order, otherwise you will end up with numbers that are meaningless and incorrect. As mentioned, if you don’t know which order to use, assume Little-Endian ordering - we will be using Little-Endian order for all examples in this document.
File Offsets
One of the most fundamentals of format exploring is the concept of file offsets. A file offset is the position of a certain piece of data in a file, measured from the first byte of the file. However, as with most computer programming, we start our number counts at 0, not at 1. Therefore, if we are at the very beginning of the file, before we read anything, we are at offset 0. After we read 1 byte, we are at offset 1. Read another 6 bytes and we are at offset 7.
If the concept is a little hard to grasp, think of an offset as being a bar that divides a file up byte-by-byte. If we are at the beginning of a file, offset 0, we have a bar right at the beginning before the first byte
I0110001011011000001011110
If we are at offset 3, we place the bar after the 3rd byte of the file, and before the 4th byte (ie. we have read 3 bytes)
011I0001011011000001011110
Similarly, offset 16 places the bar after byte 16, and before byte 17
0110001011011000I001011110
Archive Patterns
There are literally thousands of different archives out there, however most archives will conform to one of several basic patterns. Here we present the basic archive patterns so you can understand how the files are built, and how they can be read. Once you know the patterns, you can identify archive formats much faster.
| Note that the samples presented here list some fields like the number of files and the header. These fields are not fixed, and indeed may be totally different to the structure of your archive. You should use the samples presented here as a guide to the overall structure, not as a exact guide to a specific format. |
In these examples, the numerical value at the start of each field indicates the number of bytes used to contain the field value. For example, the line 4 - Directory Offset shows that there are 4 bytes used to store the directory offset. This field would thus be read as a 32-bit number, as described earlier.
| Note that most fields will either be 2, 4, or 8 bytes in length, corresponding to the data types presented earlier (16-bit, 32-bit, and 64-bit respectively). The main exception is the filename field, which naturally could be any arbitrary length. |
Directory Archives
Directory Archives are by far the most common structure in use today. As the name suggests, these archives store a directory that lists details about all the files, such as their name, offset and length. These archives are usually simple and very easy to read.
The directory can be stored anywhere in the archive, however it is typically close to the beginning or the end. If the directory is not at the start of the archive, there will typically be a field that tells the offset to the directory, so that you can find it easily. This field is called the directory offset, and is usually found in the header or at the very end of the archive.
Here is a sample graphic representation of this archive structure:
| Archive Header | ||
| 4 - Header Tag (String) 4 - Number of Files | ||
| Directory | ||
| File Entry 1 | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| File Entry 2 | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| … | ||
| File Entry n | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| File Data | ||
| File Data 1 | ||
| X - File Data | ||
| File Data 2 | ||
| X - File Data | ||
| … | ||
| File Data n | ||
| X - File Data | ||
Split Directory Archives
Split Directory Archives are similar in structure to the Directory Archive, with the main difference that there are multiple separate directories rather than a single collective directory. The way the directories are split is totally up to the individual - so here we will present 2 of the more common split types.
This first example is a split directory, where the first directory contains the offsets and lengths, and the second directory contains the filenames:
| Archive Header | ||
| 4 - Header Tag (String) 4 - Number of Files 4 - Files Directory Offset 4 - Filenames Directory Offset | ||
| Files Directory | ||
| File Entry 1 | ||
| 4 - File Offset 4 - File Size | ||
| File Entry 2 | ||
| 4 - File Offset 4 - File Size | ||
| … | ||
| File Entry n | ||
| 4 - File Offset 4 - File Size | ||
| Filenames Directory | ||
| File Entry 1 | ||
| X - Filename | ||
| File Entry 2 | ||
| X - Filename | ||
| … | ||
| File Entry n | ||
| X - Filename | ||
| File Data | ||
| File Data 1 | ||
| X - File Data | ||
| File Data 2 | ||
| X - File Data | ||
| … | ||
| File Data n | ||
| X - File Data | ||
This second example is a split directory, where the first directory contains the offsets, the second directory contains the lengths, and the third directory contains the filenames:
| Archive Header | ||
| 4 - Header Tag (String) 4 - Number of Files 4 - Offsets Directory Offset 4 - Lengths Directory Offset 4 - Filenames Directory Offset | ||
| Offsets Directory | ||
| File Entry 1 | ||
| 4 - File Offset | ||
| File Entry 2 | ||
| 4 - File Offset | ||
| … | ||
| File Entry n | ||
| 4 - File Offset | ||
| Lengths Directory | ||
| File Entry 1 | ||
| 4 - File Size | ||
| File Entry 2 | ||
| 4 - File Size | ||
| … | ||
| File Entry n | ||
| 4 - File Size | ||
| Filenames Directory | ||
| File Entry 1 | ||
| X - Filename | ||
| File Entry 2 | ||
| X - Filename | ||
| … | ||
| File Entry n | ||
| X - Filename | ||
| File Data | ||
| File Data 1 | ||
| X - File Data | ||
| File Data 2 | ||
| X - File Data | ||
| … | ||
| File Data n | ||
| X - File Data | ||
External Directory Archives
External Directory Archives have the same structure as the Directory Archive, however the directory data and the file data are stored in 2 separate files. Naturally, the file that contains the file data is very large, and the directory file very small.
Note that the 2 files both have the same name, but different extensions. The extensions of the files can be anything, however some common extensions for the directory are *.dir, *.fat, and *.idx.
Here is a sample graphic representation of this archive type, where the Example.dir file contains the directory information, and the Example.dat file contains the file data:
Example.dir
| Archive Header | ||
| 4 - Number of Files | ||
| Directory | ||
| File Entry 1 | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| File Entry 2 | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| … | ||
| File Entry n | ||
| 4 - File Offset 4 - File Size X - Filename | ||
Example.dat | ||
| File Data | ||
| File Data 1 | ||
| X - File Data | ||
| File Data 2 | ||
| X - File Data | ||
| … | ||
| File Data n | ||
| X - File Data | ||
Chunked Archives
Chunked Archives are a simple structure where the files are stored one after the other. Each file has its own header that gives information about the file, particularly the file size. These archives, probably the simplest of all the archive types, are examined by reading the header of the file, skipping the file data, then repeating again for the remaining files until you reach the end of the archive.
One thing to note: these archives typically don’t store filenames, rather they store a 4-byte String that can be treated like the files’ extension.
Here is a example of this archive type:
| Archive Header | |||
| 4 - Header Tag (String) | |||
| Chunks | |||
| File 1 | |||
| File Header 1 | |||
| 4 - File Type (String) 4 - File Size | |||
| File Data 1 | |||
| X - File Data | |||
| File 2 | |||
| File Header 2 | |||
| 4 - File Type (String) 4 - File Size | |||
| File Data 2 | |||
| X - File Data | |||
| … | |||
| File n | |||
| File Header n | |||
| 4 - File Type (String) 4 - File Size | |||
| File Data n | |||
| X - File Data | |||
Split Chunk Archives
Split Chunk Archives have the same basic structure of the Chunked Archives, however each file is also split up into chunks. Each file chunk is usually the same size (except for the last chunk in each file), which allows efficient use of buffers when reading the file.
Here is a sample graphic representation of this archive type:
| Archive Header | |||
| 4 - Header Tag (String) | |||
| Chunks | |||
| File 1 | |||
| File Header 1 | |||
| 4 - File Type (String) 4 - File Size 4 - Number Of Chunks 4 - Chunk Size | |||
| File Data 1 | |||
| X - File Data Chunk 1 X - File Data Chunk 2 … X - File Data Chunk n | |||
| File 2 | |||
| File Header 2 | |||
| 4 - File Type (String) 4 - File Size 4 - Number Of Chunks 4 - Chunk Size | |||
| File Data 2 | |||
| X - File Data Chunk 1 X - File Data Chunk 2 … X - File Data Chunk n | |||
| … | |||
| File n | |||
| File Header n | |||
| 4 - File Type (String) 4 - File Size 4 - Number Of Chunks 4 - Chunk Size | |||
| File Data n | |||
| X - File Data Chunk 1 X - File Data Chunk 2 … X - File Data Chunk n | |||
Tree Archives
Tree Archives are the most complicated of the archive types, and thankfully they are not used very often. The idea is that the archive tries to store a complete directory tree structure, such as the individual folders. This is usually done by creating a directory for each folder, and linking them together, as you will see in the example.
Here is a sample graphic representation of this archive type, however there can be many variations:
| Archive Header | ||
| 4 - Header Tag (String) 4 - Number of Folders at the root 4 - Total Number of Files | ||
| Folder Entries | ||
| Folder Entry 1 | ||
| X - Folder Name 4 - Number of Sub-folders in this folder 4 - Offset to the first Sub-folder entry for this folder 4 - Number of Files in this folder 4 - Offset to the first file entry for this folder | ||
| Folder Entry 2 | ||
| X - Folder Name 4 - Number of Sub-folders in this folder 4 - Offset to the first Sub-folder entry for this folder 4 - Number of Files in this folder 4 - Offset to the first file entry for this folder | ||
| … | ||
| Folder Entry n | ||
| X - Folder Name 4 - Number of Sub-folders in this folder 4 - Offset to the first Sub-folder entry for this folder 4 - Number of Files in this folder 4 - Offset to the first file entry for this folder | ||
| File Entries | ||
| File Entry 1 | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| File Entry 2 | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| … | ||
| File Entry n | ||
| 4 - File Offset 4 - File Size X - Filename | ||
| File Data | ||
| File Data 1 | ||
| X - File Data | ||
| File Data 2 | ||
| X - File Data | ||
| … | ||
| File Data n | ||
| X - File Data | ||
As this archive type is quite difficult to explain, I will provide an example here. Lets pretend that our archive contains 3 files, as specified below:
\data\sounds\snd1.wav
\data\sounds\snd2.wav
\data\images\temp\pic1.bmp
The following diagram shows the structure of the archive that contains these 3 files (with the values of each field shown in green)
| Archive Header | ||
| 4 - Header Tag (String) HEAD 4 - Number of Folder at the root 1 4 - Total Number of Files 3 | ||
| Folder Entries | ||
| Folder Entry 1 | ||
| X - Folder Name data 4 - Number of Sub-folders in this folder 2 4 - Offset to first Sub-folder offset to Folder Entry 2 4 - Number of Files in this folder 0 4 - Offset to first file entry 0 | ||
| Folder Entry 2 | ||
| X - Folder Name sounds 4 - Number of Sub-folders in this folder 0 4 - Offset to first Sub-folder 0 4 - Number of Files in this folder 2 4 - Offset to first file entry offset to File Entry 1 | ||
| Folder Entry 3 | ||
| X - Folder Name images 4 - Number of Sub-folders in this folder 1 4 - Offset to first Sub-folder offset to Folder Entry 4 4 - Number of Files in this folder 0 4 - Offset to first file entry 0 | ||
| Folder Entry 4 | ||
| X - Folder Name temp 4 - Number of Sub-folders in this folder 0 4 - Offset to first Sub-folder 0 4 - Number of Files in this folder 1 4 - Offset to first file entry offset to File Entry 3 | ||
| File Entries | ||
| File Entry 1 | ||
| 4 - File Offset offset to File Data 1 4 - File Size length of File Data 1 X - Filename snd1.wav | ||
| File Entry 2 | ||
| 4 - File Offset offset to File Data 2 4 - File Size length of File Data 2 X - Filename snd2.wav | ||
| File Entry 3 | ||
| 4 - File Offset offset to File Data 3 4 - File Size length of File Data 3 X - Filename pic1.bmp | ||
| File Data | ||
| File Data 1 | ||
| X - File Data the data for file snd1.wav | ||
| File Data 2 | ||
| X - File Data the data for file snd2.wav | ||
| File Data 3 | ||
| X - File Data the data for file pic1.bmp | ||
Lets walk through the reading of this file. The color orange indicates the name of a field in the example. The other colors are the same as the colors in the example.
First we read the Archive Header and see that there is only 1 folder at the root. This lets us know that we now need to read a single folder entry.
We read Folder Entry 1, called temp, and are told there are 2 sub-folders. The sub-folders start at a certain offset in the archive.
So we skip to the offset of the sub-folders. For each of the 2 sub-folders, we need to read a folder entry. The first folder entry read is called sounds (Folder Entry 2) and there are 2 files in it. The second entry is called images (Folder Entry 3) and there is 1 sub-folder in it.
So we jump to the offset for the first file entry for the sounds folder, and read 2 file entries, namely snd1.wav (File Entry 1) and snd2.wav (File Entry 2). After we have read these, we jump back to where we were. We have finished with everything in the sounds folder, so we move on to the offset of the sub-folders for the images folder. We read 1 folder entry, called temp (Folder Entry 4), which has 1 file in it.
We jump forward to the offset for the first file entry and read 1 file entry, called pic1.bmp (File Entry 3). We know that the total number of files is 3, so now we have finished reading the tree.
Using this method, we can build up a complex directory tree. This type of archive is usually slightly smaller in size than the plain directory archive, because the filenames don’t have to repeat the entire folder string for each entry, however the compromise is that it takes longer to read because you are jumping all over the place. For this reason, and the fact that it is a very complex structure, only a few games use this type of structure.
Nested Tree Archives
Nested Tree Archives, even though the name sounds hard, are a simpler version of the Tree Archive. The idea is the same as the Tree Archive: store a complete directory tree structure, however the Nested Tree Archive can be read more efficiently as there is no jumping around.
Here is a sample graphic representation of this archive type. The diagram is a little tricky to follow, so read the description and psuedo-code following it for a clearer explanation, then try to read the diagram:
| Archive Header | ||||||
| 4 - Header Tag (String) 4 - Number of Folders at the root 4 - Total Number of Files | ||||||
| Entries | ||||||
| Folder Entry 1 | ||||||
| X - Folder Name 4 - Number of Sub-folders 4 - Number of Files | ||||||
| Sub-Folder Entries 1 | ||||||
| Sub-Folder Entry 1a | ||||||
| X - Folder Name 4 - Number of Sub-folders 4 - Number of Files | ||||||
| Sub-Folder Entries 1.1 | ||||||
| Sub-Folder Entry 1.1a | ||||||
| X - Folder Name 4 - Number of Sub-folders 4 - Number of Files | ||||||
| … | ||||||
| Sub-Folder Entry 1.1n | ||||||
| X - Folder Name 4 - Number of Sub-folders 4 - Number of Files | ||||||
| File Entries 1.1 | ||||||
| File Entry 1.1a | ||||||
| 4 - File Offset 4 - File Size X - Filename | ||||||
| … | ||||||
| File Entry 1.1n | ||||||
| 4 - File Offset 4 - File Size X - Filename | ||||||
| … | ||||||
| Sub-Folder Entry 1n | ||||||
| X - Folder Name 4 - Number of Sub-folders 4 - Number of Files | ||||||
| File Entries 1 | ||||||
| File Entry 1a | ||||||
| 4 - File Offset 4 - File Size X - Filename | ||||||
| … | ||||||
| File Entry 1n | ||||||
| 4 - File Offset 4 - File Size X - Filename | ||||||
| … | ||||||
| Folder Entry n | ||||||
| X - Folder Name 4 - Number of Sub-folders 4 - Number of Files | ||||||
| Sub-Folder Entries n | ||||||
| File Entries n | ||||||
| File Data | ||||||
| File Data 1 | ||||||
| X - File Data | ||||||
| File Data 2 | ||||||
| X - File Data | ||||||
| … | ||||||
| File Data n | ||||||
| X - File Data | ||||||
The diagram may seem difficult, but that is mostly due to the fact that it is nested, meaning that it can have as many directories-inside-directories as you like.
This is the way it works. You first read the Archive Header, and see how many folders at the root there are. Usually there will only be 1 folder at the root.
So you read 1 Folder Entry, and find out the number of sub-folders, and the number of files. For every sub-folder, you repeat again from Folder Entry 1. When you have read all the sub-folders, you then read all the File Entries.
If you can read psuedo-code, here is the kind of thing I am trying to describe:
method readArchive(){
read (FolderEntry);
for each (sub-folder){
readArchive();
}
for each (file){
read (FileEntry);
}
}
So, you begin by reading a FolderEntry. If the entry has sub-folders in it, you must immediately read the entries for those sub-folders, by repeating the process from the beginning. When all the sub-folder entries for this FolderEntry have been read, you can then progress and read the FileEntries for the folder.