Dreamfall PAK
Choose archive extension:
PAK
- Format Type : Archive
- Endian Order : Little Endian
Format Specifications
// PAK header
char {12} - Identifier ("tlj_pack0001")
uint32 {4} - Number of hash nodes
uint32 {4} - Number of name table indexes
uint32 {4} - Length of name table
Following the header, there are three distinct arrays of entries (hash nodes, the name table and name indexes, in this order; note the reversal of the matching array lengths in the PAK header structure).
Unfortunately, the file name addressing scheme used in this archive type is less of a real file system and more of a hashing technique. This makes it easy to extract files from the archive if their internal file name is already known, but extracting all files by "brute force" proves difficult since it heavily involves name-guessing.
General naming scheme
All names stored in PAK archives are not directly visible as ASCII characters, but have been encoded by uint8 values in the range of 1 to 44. The list of matching characters is defined as follows:
abcdefghijklmnopqrstuvwxyz/??-_?.0123456789
Here, a "?" denotes a still unknown character. There might actually be more characters, but none are known yet.
File system nodes
// Hash node
uint32 {4} - File offset
uint32 {4} - File size
uint32 {4} - Base node index
uint32 {4} - Path length
uint32 {4} - Name table index
These entries comprise the base for the actual name-finding scheme. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates.
Name table
The name table simply consists of encoded character strings seperated by zero-terminators. The length of the table (in bytes) is determined by the appropriate header field.
Name table indexes
This array of uint32 values specifies indexes into the name table.
Extracting a file by name
If the full path and name of a file is known, it can be extracted by the following method:
- Set the index of the current hash node to zero.
- Encode the next (not yet processed) character of the path by means of the conversion table.
- Use this encoding as a relative offset from the base node index specified in the current hash node to look up the next hash node. This will be the new "current" node.
- Use the name table index of the current hash node to look up the matching character string in the name table. This string specifies a futher part of the path.
- Repeat the steps (apart from the first) until the whole path has been processed. The file can then be extracted using the offset and size values of the last hash node.
For clarification, consider the following example: The file "directory/file.dat" has to be extracted. Then:
- Start at hash node zero. The base node index value of this node is usually zero as well.
- The first character of the fully-qualified file name is "d", its encoding is 0x04. Thus, if the current base node index value is indeed zero, change to the hash node at index 0x04.
- Look up the charcter string in the name table. This should be a further part of the path, e.g. the string "irect".
- Take the next character ("o" in the example) and again use its encoding (0x0f) as a relative offset from the current base node index. If this value is e.g. 0x20, the next hash node will be 0x2f.
- Here, we might finally find the rest of the path by once again taking a look into the name table. If the charcter string found is "ry/file.dat", we're done searching and can extract the file.
Note that the file names can be broken up at any point. Also, the path length value of each hash node specifies the length of the whole path (excluding trailing slashes) up to any point in the search process.
If any of the above operations fail during node search (e. g. path length mismatch, wrong character string), then the requested file is not contained in this archive file. Conversely, a single file entry within this archive might actually be reachable by multiple paths. Though probably only one of these paths is the intended one, all of them are valid in the context of this structure. Due to this fact, creating a list of "correct" names for each file contained in a given archive is actually impossible. Any attempts to do so would need to rely heavily on forward- or backward-guessing.
Futher notes
- The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this.
- Some real file names for the extraction process can be seen inside the compiled Shark3D binaries within the single archives.
- One file, resource.pak, does not follow the format described here, but is structured in a completely different (and yet unknown) way. Its purpose is still unclear as well.
MultiEx BMS Script
Not written yet
Compatible Programs
Unknown