Dreamfall PAK: Difference between revisions

From XentaxWiki
Jump to navigation Jump to search
imported>DenizOezmen
(added preliminary format description)
imported>Ikskoks
No edit summary
 
(19 intermediate revisions by 3 users not shown)
Line 1: Line 1:
* [[GRAFs|Return to the list of games]]
{{GRAFPageHeader}}


Choose archive extension:
== PAK ==
{{GRAFPageMisc}}


== PAK ==  
=== Format Specifications ===
{{GRAFPageFormat|1=
char {12}&nbsp;&nbsp;&nbsp; - Identifier {{Purple|(tlj_pack0001)}}<br>
uint32 {4}&nbsp;&nbsp; - Number of hash nodes<br>
uint32 {4}&nbsp;&nbsp; - Number of name table indexes<br>
uint32 {4}&nbsp;&nbsp; - Length of name table<br>
{{Blue|// For each hash node}}<br>
:uint32 {4}&nbsp;&nbsp; - File offset
:uint32 {4}&nbsp;&nbsp; - File size
:uint32 {4}&nbsp;&nbsp; - Base node index
:uint32 {4}&nbsp;&nbsp; - Path length
:uint32 {4}&nbsp;&nbsp; - Name table index
{{Blue|// Name table}}<br>
:char {x}&nbsp;&nbsp;&nbsp;&nbsp; - Name table data {{Green|// zero-terminated strings}}
{{Blue|// Name table indexes}}<br>
:uint32 {x}&nbsp;&nbsp; - Name table index data
}}


* ''' Format Type ''':    Archive <br>
=== Notes and Comments ===
* ''' [http://en.wikipedia.org/wiki/Endianness Endian Order] ''': Little Endian <br>


Unfortunately, the file name addressing scheme used in this archive type is less of a real file system and more of a hashing technique. This makes it easy to extract files from the archive if their internal file name is already known, but extracting all files by "brute force" proves difficult since it heavily involves name-guessing.


<font color="red">Note: This documentation is still very preliminary and should be considered as work in progress.</font>
==== General naming scheme ====


=== Format Specifications ===
All names stored in PAK archives are not directly visible as ASCII characters, but have been encoded by byte values in the range of 1 to 43. The list of matching characters is defined as follows:


<tt>
abcdefghijklmnopqrstuvwxyz/<LF><CR>-_'.0123456789
<b>
<font color="blue">// PAK header</font><br>
char {12}&nbsp; - Identifier <font color="purple">("tlj_pack0001")</font><br>
uint32 {4} - Number of Type 1 entries<br>
uint32 {4} - Number of Type 3 entries<br>
uint32 {4} - Number of Type 2 entries<br>
</b>
</tt><br>


Following the header, there are three distinct arrays of entries (Type 1, Type 2 and Type 3, in this order; note the reversal of the matching array lengths in the PAK header structure). The naming of the entry types is preliminary as their actual purpose is not yet fully understood.
* <CR> and <LF> denote the carriage return and line feed control sequences, respectively. Oddly enough, these have been included in the character table, but do not seem to appear in any observed file name.
* The characters slash and backslash are interchangeable, but most files are internally referenced using the slash as the path-delimiting character.
* All other (i. e. unsupported) characters are assigned to an encoded value of 32, the same as the apostrophe.


==== Hash nodes ====


==== Type 1 ====
These entries comprise the base for the actual name-finding scheme. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates.


<tt>
==== Name table ====
<b>
<font color="blue">// Type 1 entry</font><br>
uint32 {4} - Absolute file offset<br>
uint32 {4} - File size<br>
uint32 {4} - Index into Type 1 entry array (probably)<br>
uint32 {4} - <font color="red">Unknown</font><br>
uint32 {4} - Index into Type 2 entry array (probably)<br>
</b>
</tt><br>


Type 1 entries determine "real" file items. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates.
The name table simply consists of encoded character strings seperated by zero-terminators. The length of the table (in bytes) is determined by the appropriate header field.


However, the file size value might also be zero. In this case, the offset value usually points to the beginning of the very first file item (i.&nbsp;e. directly after the header and entry array structures). The purpose of these entries is not yet known. The remaining three fields (or at least the indexes into the array structures) seem to be especially important in the latter case, possibly indicating a file system-like chaining.
==== Name table indexes ====


This array of uint32 values specifies indexes into the name table.


==== Type 2 ====
==== Extracting a file by name ====


<tt>
If the full path and name of a file is known, it can be extracted by the following method:<br>
<b>
<font color="blue">// Type 2 entry</font><br>
uint8 {1} - <font color="red">Unknown</font><br>
</b>
</tt><br>


Despite the trivial structure, the purpose of these bytes is unknown. Their values usually seem to be relatively low, though.
* Set the index of the current hash node to zero.
* Encode the next (not yet processed) character of the path by means of the conversion table.
* Use this encoding as a relative offset from the base node index specified in the current hash node to look up the next hash node. This will be the new "current" node.
* Use the name table index of the current hash node to look up the matching character string in the name table. This string specifies a futher part of the path.
* Repeat the steps (apart from the first) until the whole path has been processed. The file can then be extracted using the offset and size values of the last hash node.<br>


For clarification, consider the following example: The file "directory/file.dat" has to be extracted. Then:<br>


==== Type 3 ====
* Start at hash node zero. The base node index value of this node is usually zero as well.
* The first character of the fully-qualified file name is "d", its encoding is 0x04. Thus, if the current base node index value is indeed zero, change to the hash node at index 0x04.
* Look up the charcter string in the name table. This should be a further part of the path, e.g. the string "irect".
* Take the next character ("o" in the example) and again use its encoding (0x0f) as a relative offset from the current base node index. If this value is e.g. 0x20, the next hash node will be 0x2f.
* Here, we might finally find the rest of the path by once again taking a look into the name table. If the charcter string found is "ry/file.dat", we're done searching and can extract the file.<br>


<tt>
Note that the file names can be broken up at any point. Also, the path length value of each hash node specifies the length of the whole path up to any point in the search process.<br>
<b>
<font color="blue">// Type 3 entry</font><br>
uint32 {4} - Index into Type 2 entry array (probably)<br>
</b>
</tt><br>


The actual purpose of these entries is also still unknown.
If any of the above operations fail during node search (e.&nbsp;g. path length mismatch, wrong character string), then the requested file is not contained in this archive. Conversely, a single file entry within this archive might actually be (and often is) reachable by multiple paths. Though probably only one of these paths is the intended one, all of them are valid in the context of this structure. Due to this fact, creating a list of "correct" names for each file contained in a given archive is actually impossible to do automatically. Any attempts to do so would need to rely heavily on forward- or backward-guessing.<br>


==== Futher notes ====


=== Futher notes ===
* The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this.
* Some real file names for the extraction process can be seen inside the compiled [http://www.shark3d.com Shark3D] binaries within the single archives.
* One file, resource.pak (which might not exist in all language versions of the game), does not follow the format described here. Judging from the header, it is a StarForce-protected archive file. Its contents are currently unknown.


* The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this.
=== MultiEx BMS Script ===
* File names for the extraction process are nowhere near to be seen. They are possibly hidden inside the compiled [http://www.shark3d.com Shark3D] binaries within the single archives. It might eventually be necessary to reverse-engineer this format as well to fully understand the PAK structure.
{{NoBMSScript}}
* One file, resource.pak, does not follow the format described here, but is structured in a completely different (and yet unknown) way. Its purpose is still unclear as well.


=== Supported by Programs ===
{{NoProgramSupport}}


=== MultiEx BMS Script ===  
=== Links ===
None


Not written yet<br><br>
=== Games ===
* [[Dreamfall]] [[PAK|*.pak]]


=== Compatible Programs ===
{{GRAFPageFooter}}


Unknown
[[Category:Platform PC| ]]
[[Category:File Format]]

Latest revision as of 12:14, 4 January 2021

Back to index | Edit this page

PAK

Format Specifications

char {12}    - Identifier (tlj_pack0001)

uint32 {4}   - Number of hash nodes
uint32 {4}   - Number of name table indexes
uint32 {4}   - Length of name table
// For each hash node

uint32 {4}   - File offset
uint32 {4}   - File size
uint32 {4}   - Base node index
uint32 {4}   - Path length
uint32 {4}   - Name table index

// Name table

char {x}     - Name table data // zero-terminated strings

// Name table indexes

uint32 {x}   - Name table index data

Notes and Comments

Unfortunately, the file name addressing scheme used in this archive type is less of a real file system and more of a hashing technique. This makes it easy to extract files from the archive if their internal file name is already known, but extracting all files by "brute force" proves difficult since it heavily involves name-guessing.

General naming scheme

All names stored in PAK archives are not directly visible as ASCII characters, but have been encoded by byte values in the range of 1 to 43. The list of matching characters is defined as follows:

abcdefghijklmnopqrstuvwxyz/<LF><CR>-_'.0123456789
  • <CR> and <LF> denote the carriage return and line feed control sequences, respectively. Oddly enough, these have been included in the character table, but do not seem to appear in any observed file name.
  • The characters slash and backslash are interchangeable, but most files are internally referenced using the slash as the path-delimiting character.
  • All other (i. e. unsupported) characters are assigned to an encoded value of 32, the same as the apostrophe.

Hash nodes

These entries comprise the base for the actual name-finding scheme. If the file size specified is non-zero, a file can be copied out of the resource archive by simply seeking to the file offset and copying as many bytes as the file size value indicates.

Name table

The name table simply consists of encoded character strings seperated by zero-terminators. The length of the table (in bytes) is determined by the appropriate header field.

Name table indexes

This array of uint32 values specifies indexes into the name table.

Extracting a file by name

If the full path and name of a file is known, it can be extracted by the following method:

  • Set the index of the current hash node to zero.
  • Encode the next (not yet processed) character of the path by means of the conversion table.
  • Use this encoding as a relative offset from the base node index specified in the current hash node to look up the next hash node. This will be the new "current" node.
  • Use the name table index of the current hash node to look up the matching character string in the name table. This string specifies a futher part of the path.
  • Repeat the steps (apart from the first) until the whole path has been processed. The file can then be extracted using the offset and size values of the last hash node.

For clarification, consider the following example: The file "directory/file.dat" has to be extracted. Then:

  • Start at hash node zero. The base node index value of this node is usually zero as well.
  • The first character of the fully-qualified file name is "d", its encoding is 0x04. Thus, if the current base node index value is indeed zero, change to the hash node at index 0x04.
  • Look up the charcter string in the name table. This should be a further part of the path, e.g. the string "irect".
  • Take the next character ("o" in the example) and again use its encoding (0x0f) as a relative offset from the current base node index. If this value is e.g. 0x20, the next hash node will be 0x2f.
  • Here, we might finally find the rest of the path by once again taking a look into the name table. If the charcter string found is "ry/file.dat", we're done searching and can extract the file.

Note that the file names can be broken up at any point. Also, the path length value of each hash node specifies the length of the whole path up to any point in the search process.

If any of the above operations fail during node search (e. g. path length mismatch, wrong character string), then the requested file is not contained in this archive. Conversely, a single file entry within this archive might actually be (and often is) reachable by multiple paths. Though probably only one of these paths is the intended one, all of them are valid in the context of this structure. Due to this fact, creating a list of "correct" names for each file contained in a given archive is actually impossible to do automatically. Any attempts to do so would need to rely heavily on forward- or backward-guessing.

Futher notes

  • The whole file might be organized in "sectors" of 2048 (?) bytes in size; the alignment of the file offsets indicates this.
  • Some real file names for the extraction process can be seen inside the compiled Shark3D binaries within the single archives.
  • One file, resource.pak (which might not exist in all language versions of the game), does not follow the format described here. Judging from the header, it is a StarForce-protected archive file. Its contents are currently unknown.

MultiEx BMS Script

None written yet.

Supported by Programs

Unknown

Links

None

Games

Navigation

Jump to a listing by...
All Formats - Common Formats - Standard Formats - Malformed Pages
Platforms
Microsoft:
Xbox
Xbox 360
Nintendo:
GameCube
DS
Desktop:
PC
Sega:
Dreamcast
Sony:
PlayStation
PlayStation 2
PlayStation 3
PlayStation Portable
Type
Animation - Archive - Audio - Image - Mesh - Miscellaneous - Model - Video
Endianness
Little-endian - Big-endian
BMS Scripts
Pages Without a BMS Script

All Pages with Scripts:
Recently Added Scripts

Program Support
No Known Support

MultiEx Commander - Game Extractor

Format Specification Completion
Work in Progress - Almost Done - Completed
Compression and Encryption
No Compression or Encryption Used - Unknown Compression or Encryption Used

One or Both Used:
Compression Used - Both Compression and Encryption Used