Dreamfall PAK: Difference between revisions
imported>DenizOezmen (refined description; a bit of a tricky format ...) |
imported>Ikskoks No edit summary |
||
| (18 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
{{GRAFPageHeader}} | |||
== PAK == | |||
{{GRAFPageMisc}} | |||
== | === Format Specifications === | ||
{{GRAFPageFormat|1= | |||
char {12} - Identifier {{Purple|(tlj_pack0001)}}<br> | |||
uint32 {4} - Number of hash nodes<br> | |||
uint32 {4} - Number of name table indexes<br> | |||
uint32 {4} - Length of name table<br> | |||
{{Blue|// For each hash node}}<br> | |||
:uint32 {4} - File offset | |||
:uint32 {4} - File size | |||
:uint32 {4} - Base node index | |||
:uint32 {4} - Path length | |||
:uint32 {4} - Name table index | |||
{{Blue|// Name table}}<br> | |||
:char {x} - Name table data {{Green|// zero-terminated strings}} | |||
{{Blue|// Name table indexes}}<br> | |||
:uint32 {x} - Name table index data | |||
}} | |||
=== Notes and Comments === | |||
=== | |||
Unfortunately, the file name addressing scheme used in this archive type is less of a real file system and more of a hashing technique. This makes it easy to extract files from the archive if their internal file name is already known, but extracting all files by "brute force" proves difficult since it heavily involves name-guessing. | Unfortunately, the file name addressing scheme used in this archive type is less of a real file system and more of a hashing technique. This makes it easy to extract files from the archive if their internal file name is already known, but extracting all files by "brute force" proves difficult since it heavily involves name-guessing. | ||
==== General naming scheme ==== | ==== General naming scheme ==== | ||
All names stored in PAK archives are not directly visible as ASCII characters, but have been encoded by | All names stored in PAK archives are not directly visible as ASCII characters, but have been encoded by byte values in the range of 1 to 43. The list of matching characters is defined as follows: | ||
abcdefghijklmnopqrstuvwxyz/ | abcdefghijklmnopqrstuvwxyz/<LF><CR>-_'.0123456789 | ||
* <CR> and <LF> denote the carriage return and line feed control sequences, respectively. Oddly enough, these have been included in the character table, but do not seem to appear in any observed file name. | |||
* The characters slash and backslash are interchangeable, but most files are internally referenced using the slash as the path-delimiting character. | |||
* All other (i. e. unsupported) characters are assigned to an encoded value of 32, the same as the apostrophe. | |||
==== Hash nodes ==== | |||
==== | |||
These entries comprise the base for the actual name-finding scheme. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates. | These entries comprise the base for the actual name-finding scheme. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates. | ||
==== Name table ==== | ==== Name table ==== | ||
The name table simply consists of encoded character strings seperated by zero-terminators. The length of the table (in bytes) is determined by the appropriate header field. | The name table simply consists of encoded character strings seperated by zero-terminators. The length of the table (in bytes) is determined by the appropriate header field. | ||
==== Name table indexes ==== | ==== Name table indexes ==== | ||
| Line 60: | Line 48: | ||
This array of uint32 values specifies indexes into the name table. | This array of uint32 values specifies indexes into the name table. | ||
==== Extracting a file by name ==== | |||
If the full path and name of a file is known, it can be extracted by the following method:<br> | |||
If the full path and name of a file is known, it can be extracted by the following method: | |||
* Set the index of the current hash node to zero. | * Set the index of the current hash node to zero. | ||
| Line 70: | Line 56: | ||
* Use this encoding as a relative offset from the base node index specified in the current hash node to look up the next hash node. This will be the new "current" node. | * Use this encoding as a relative offset from the base node index specified in the current hash node to look up the next hash node. This will be the new "current" node. | ||
* Use the name table index of the current hash node to look up the matching character string in the name table. This string specifies a futher part of the path. | * Use the name table index of the current hash node to look up the matching character string in the name table. This string specifies a futher part of the path. | ||
* Repeat the steps (apart from the first) until the whole path has been processed. The file can then be extracted using the offset and size values of the last hash node. | * Repeat the steps (apart from the first) until the whole path has been processed. The file can then be extracted using the offset and size values of the last hash node.<br> | ||
For clarification, consider the following example: The file "directory/file.dat" has to be extracted. Then: | For clarification, consider the following example: The file "directory/file.dat" has to be extracted. Then:<br> | ||
* Start at hash node zero. The base node index value of this node is usually zero as well. | * Start at hash node zero. The base node index value of this node is usually zero as well. | ||
| Line 78: | Line 64: | ||
* Look up the charcter string in the name table. This should be a further part of the path, e.g. the string "irect". | * Look up the charcter string in the name table. This should be a further part of the path, e.g. the string "irect". | ||
* Take the next character ("o" in the example) and again use its encoding (0x0f) as a relative offset from the current base node index. If this value is e.g. 0x20, the next hash node will be 0x2f. | * Take the next character ("o" in the example) and again use its encoding (0x0f) as a relative offset from the current base node index. If this value is e.g. 0x20, the next hash node will be 0x2f. | ||
* Here, we might finally find the rest of the path by once again taking a look into the name table. If the charcter string found is "ry/file.dat", we're done searching and can extract the file. | * Here, we might finally find the rest of the path by once again taking a look into the name table. If the charcter string found is "ry/file.dat", we're done searching and can extract the file.<br> | ||
Note that the file names can be broken up at any point. Also, the path length value of each hash node specifies the length of the whole path | Note that the file names can be broken up at any point. Also, the path length value of each hash node specifies the length of the whole path up to any point in the search process.<br> | ||
If any of the above operations fail during node search (e. g. path length mismatch, wrong character string), then the requested file is not contained in this archive | If any of the above operations fail during node search (e. g. path length mismatch, wrong character string), then the requested file is not contained in this archive. Conversely, a single file entry within this archive might actually be (and often is) reachable by multiple paths. Though probably only one of these paths is the intended one, all of them are valid in the context of this structure. Due to this fact, creating a list of "correct" names for each file contained in a given archive is actually impossible to do automatically. Any attempts to do so would need to rely heavily on forward- or backward-guessing.<br> | ||
==== Futher notes ==== | |||
=== Futher notes === | |||
* The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this. | * The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this. | ||
* Some real file names for the extraction process can be seen inside the compiled [http://www.shark3d.com Shark3D] binaries within the single archives. | * Some real file names for the extraction process can be seen inside the compiled [http://www.shark3d.com Shark3D] binaries within the single archives. | ||
* One file, resource.pak, does not follow the format described here, | * One file, resource.pak (which might not exist in all language versions of the game), does not follow the format described here. Judging from the header, it is a StarForce-protected archive file. Its contents are currently unknown. | ||
=== MultiEx BMS Script === | |||
{{NoBMSScript}} | |||
=== Supported by Programs === | |||
{{NoProgramSupport}} | |||
=== | === Links === | ||
None | |||
=== Games === | |||
* [[Dreamfall]] [[PAK|*.pak]] | |||
{{GRAFPageFooter}} | |||
[[Category:Platform PC| ]] | |||
[[Category:File Format]] | |||
Latest revision as of 12:14, 4 January 2021
Back to index | Edit this page
PAK
- Format type: Archive
- Endianness: Little-endian
Format Specifications
uint32 {4} - Number of hash nodes
uint32 {4} - Number of name table indexes
uint32 {4} - Length of name table
// For each hash node
- uint32 {4} - File offset
- uint32 {4} - File size
- uint32 {4} - Base node index
- uint32 {4} - Path length
- uint32 {4} - Name table index
// Name table
- char {x} - Name table data // zero-terminated strings
// Name table indexes
- uint32 {x} - Name table index data
Notes and Comments
Unfortunately, the file name addressing scheme used in this archive type is less of a real file system and more of a hashing technique. This makes it easy to extract files from the archive if their internal file name is already known, but extracting all files by "brute force" proves difficult since it heavily involves name-guessing.
General naming scheme
All names stored in PAK archives are not directly visible as ASCII characters, but have been encoded by byte values in the range of 1 to 43. The list of matching characters is defined as follows:
abcdefghijklmnopqrstuvwxyz/<LF><CR>-_'.0123456789
- <CR> and <LF> denote the carriage return and line feed control sequences, respectively. Oddly enough, these have been included in the character table, but do not seem to appear in any observed file name.
- The characters slash and backslash are interchangeable, but most files are internally referenced using the slash as the path-delimiting character.
- All other (i. e. unsupported) characters are assigned to an encoded value of 32, the same as the apostrophe.
Hash nodes
These entries comprise the base for the actual name-finding scheme. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates.
Name table
The name table simply consists of encoded character strings seperated by zero-terminators. The length of the table (in bytes) is determined by the appropriate header field.
Name table indexes
This array of uint32 values specifies indexes into the name table.
Extracting a file by name
If the full path and name of a file is known, it can be extracted by the following method:
- Set the index of the current hash node to zero.
- Encode the next (not yet processed) character of the path by means of the conversion table.
- Use this encoding as a relative offset from the base node index specified in the current hash node to look up the next hash node. This will be the new "current" node.
- Use the name table index of the current hash node to look up the matching character string in the name table. This string specifies a futher part of the path.
- Repeat the steps (apart from the first) until the whole path has been processed. The file can then be extracted using the offset and size values of the last hash node.
For clarification, consider the following example: The file "directory/file.dat" has to be extracted. Then:
- Start at hash node zero. The base node index value of this node is usually zero as well.
- The first character of the fully-qualified file name is "d", its encoding is 0x04. Thus, if the current base node index value is indeed zero, change to the hash node at index 0x04.
- Look up the charcter string in the name table. This should be a further part of the path, e.g. the string "irect".
- Take the next character ("o" in the example) and again use its encoding (0x0f) as a relative offset from the current base node index. If this value is e.g. 0x20, the next hash node will be 0x2f.
- Here, we might finally find the rest of the path by once again taking a look into the name table. If the charcter string found is "ry/file.dat", we're done searching and can extract the file.
Note that the file names can be broken up at any point. Also, the path length value of each hash node specifies the length of the whole path up to any point in the search process.
If any of the above operations fail during node search (e. g. path length mismatch, wrong character string), then the requested file is not contained in this archive. Conversely, a single file entry within this archive might actually be (and often is) reachable by multiple paths. Though probably only one of these paths is the intended one, all of them are valid in the context of this structure. Due to this fact, creating a list of "correct" names for each file contained in a given archive is actually impossible to do automatically. Any attempts to do so would need to rely heavily on forward- or backward-guessing.
Futher notes
- The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this.
- Some real file names for the extraction process can be seen inside the compiled Shark3D binaries within the single archives.
- One file, resource.pak (which might not exist in all language versions of the game), does not follow the format described here. Judging from the header, it is a StarForce-protected archive file. Its contents are currently unknown.
MultiEx BMS Script
None written yet.
Supported by Programs
Unknown
Links
None