Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Root File Format

The Root file is the primary catalog of all files stored in CASC archives. It maps file paths or FileDataIDs to content keys, enabling game clients to locate and retrieve specific assets.

Overview

The Root file serves as the master index for all game content:

  • Maps FileDataIDs to content keys

  • Supports multiple locales and content flags

  • Groups files into blocks for efficient lookup

  • Handles both named and unnamed entries

File Structure

The Root file is BLTE-encoded and organized into blocks:

[BLTE Container]
  [Header]
  [Block 1]
  [Block 2]
  ...
  [Block N]

Binary Format

Version Detection

The Root file format has evolved significantly:

  • Pre-30080: No MFST magic, raw block data

  • Build 30080+ (v2): MFST magic with file counts

  • Build 50893+ (v3): Added header_size/version fields

  • Build 58221+ (v4): Extended content flags to 40 bits

Header Structures

Version 2 (Build 30080+)

struct RootHeaderV2 {
    uint32_t magic;              // 'MFST' (0x4D465354) or 'TSFM' (0x5453464D)
    uint32_t total_file_count;   // Total number of files
    uint32_t named_file_count;   // Number of named entries
};

Note: Some builds use ‘TSFM’ magic instead of ‘MFST’. This appears to be a little-endian representation. Both should be accepted as valid.

Version 3 (Build 50893+)

struct RootHeaderV3 {
    uint32_t magic;              // 'MFST' (0x4D465354) or 'TSFM' (0x5453464D)
    uint32_t header_size;        // Size of header (20 bytes)
    uint32_t version;            // Version (1)
    uint32_t total_file_count;   // Total number of files
    uint32_t named_file_count;   // Number of named entries
    uint32_t padding;            // Padding (0)
};

Note: Version 3 also uses TSFM magic in observed builds, maintaining consistency with Version 2.

Version Detection Heuristic: After reading the magic, check the next two u32 values. If the first value (header_size) is in range [16, 100) and the second value (version) is less than 10, the file is v3+. Otherwise treat the first value as total_file_count (v2). Version 1 maps to V2 block format.

Block Structure

Each block contains file entries for specific locale and content flag combinations. Important: The block header format changed significantly between V1 and V2+.

V1 Block Header (Pre-30080, 12 bytes)

V1 files have no MFST/TSFM magic and use a 12-byte block header with interleaved record format:

struct RootBlockHeaderV1 {
    uint32_t num_records;        // Number of records in block
    uint32_t content_flags;      // Content flags (32-bit)
    uint32_t locale_flags;       // Locale flags (language/region)

    // FileDataID deltas (delta-encoded)
    int32_t fileDataIDDeltas[num_records];

    // Interleaved record data (content_key + name_hash per record)
    RootRecordInterleaved records[num_records];
};

V2+ Block Header (Build 30080+, 17 bytes)

V2 and later versions have MFST/TSFM magic and use a 17-byte block header with separated arrays. Per wowdev.wiki documentation for Version 2 (11.1.0+):

#pragma pack(push, 1)
struct RootBlockHeaderV2 {
    uint32_t num_records;        // Number of records in block
    uint32_t locale_flags;       // Locale flags (MOVED - was third in V1!)
    uint32_t content_flags;      // Content flags (was second in V1)
    uint32_t unk2;               // Unknown field 2
    uint8_t  unk3;               // Unknown field 3 (flags via bit-shift)

    // FileDataID deltas (delta-encoded)
    int32_t fileDataIDDeltas[num_records];

    // Separated arrays (all content_keys, then all name_hashes)
    uint8_t content_keys[num_records][16];
    uint8_t name_hashes[num_records][8];  // Optional based on flags
};
#pragma pack(pop)

Critical Implementation Note: The field order change from V1 to V2+ is a common source of parsing bugs. In V1, the order is num_records, content_flags, locale_flags. In V2+, the order is num_records, locale_flags, content_flags, unk2, unk3.

V4 Extended Content Flags

V4 (Build 58221+) extends content flags to 40 bits, increasing the block header to 18 bytes (the content_flags field grows from 4 to 5 bytes). The 40-bit value is read as a u32 (4 bytes) plus a u8 (1 byte):

uint32_t content_flags_low;   // Bits 0-31
uint8_t  content_flags_high;  // Bits 32-39
// Combined: content_flags = content_flags_low | (content_flags_high << 32)

Record Formats

Old Format (Interleaved)

struct RootRecordOld {
    uint8_t content_key[16];     // MD5 content key
    uint8_t name_hash[8];        // Jenkins96 name hash (optional)
};

New Format (Separated)

struct RootRecordNew {
    // Arrays stored separately
    uint8_t content_keys[num_records][16];
    uint8_t name_hashes[num_records][8];  // Optional
};

Content Flags

Content flags specify platform, architecture, and file attributes:

32-bit Flags (v2-v3)

Values match CascLib (CascLib.h), TACTSharp, and WoWDev wiki:

ValueFlagDescription
0x00000004InstallInstall manifest entry
0x00000008LoadOnWindowsWindows platform
0x00000010LoadOnMacOSmacOS platform
0x00000020x86_3232-bit x86 architecture
0x00000040x86_6464-bit x86 architecture
0x00000080LowViolenceCensored content
0x00000100DoNotLoadSkip file
0x00000800UpdatePluginLauncher plugin
0x00008000Arm64ARM64 architecture
0x08000000EncryptedEncrypted content
0x10000000NoNameHashNo name hash in block
0x20000000UncommonResolutionNon-standard resolution
0x40000000BundleBundled content
0x80000000NoCompressionUncompressed

40-bit Flags (v4+)

Build 58221+ extends to 40 bits, stored as u32 + u8:

  • Bits 0-31: Standard content flags (same as v2/v3)

  • Bits 32-39: Extended flags (single byte, shifted left by 32)

Common combinations:

  • 0x00000000: All platforms, default

  • 0x00000008: Windows only

  • 0x00000010: macOS only

  • 0x08000000: Encrypted content

  • 0x10000000: No name hash present

Locale Flags

32-bit field representing language/region:

ValueLocaleDescription
0x00000002enUSEnglish (US)
0x00000004koKRKorean
0x00000010frFRFrench
0x00000020deDEGerman
0x00000040zhCNChinese (Simplified)
0x00000080esESSpanish (Spain)
0x00000100zhTWChinese (Traditional)
0x00000200enGBEnglish (UK)
0x00000400enCNEnglish (China)
0x00000800enTWEnglish (Taiwan)
0x00001000esMXSpanish (Mexico)
0x00002000ruRURussian
0x00004000ptBRPortuguese (Brazil)
0x00008000itITItalian
0x00010000ptPTPortuguese (Portugal)
0xFFFFFFFFAllAll locales

FileDataID Delta Encoding

FileDataIDs use delta encoding for compression:

#![allow(unused)]
fn main() {
fn decode_file_data_ids(deltas: &[i32]) -> Vec<u32> {
    let mut ids = Vec::new();
    let mut current_id = 0u32;

    for (i, &delta) in deltas.iter().enumerate() {
        if i == 0 {
            // First entry: direct value, not a delta
            current_id = delta as u32;
        } else {
            // Subsequent entries: add delta to previous ID
            current_id = (current_id as i32 + delta) as u32;
        }
        ids.push(current_id);

        // Important: Increment for next iteration
        current_id += 1;
    }

    ids
}
}

Note: The algorithm increments current_id by 1 after each entry, then applies the next delta. This handles sequential FileDataIDs efficiently.

Lookup Process

  1. Parse Root file: Decompress BLTE, read header and blocks
  2. Filter by flags: Select blocks matching desired locale/content
  3. Find FileDataID: Binary search or iterate through blocks
  4. Extract content key: Retrieve corresponding MD5 hash
  5. Resolve via encoding: Use content key to find encoding key

Name Hash Calculation

For named files, Jenkins96 hash (hashlittle2) is used:

#![allow(unused)]
fn main() {
fn jenkins96_hash(filename: &str) -> u64 {
    // Normalize path: uppercase with backslashes (matching CascLib's
    // NormalizeFileName_UpperBkSlash)
    let normalized = filename.to_uppercase().replace('/', "\\");
    let bytes = normalized.as_bytes();

    // Jenkins hashlittle2 with pc=0, pb=0
    let hash = Jenkins96::hash(bytes);

    // Return (pc << 32) | pb directly (no word swap)
    // Matches CascLib's CalcNormNameHash
    hash.hash64
}
}

Important Jenkins96 Details:

  • Paths are normalized to uppercase with backslashes (not forward slashes)

  • The hash is 64-bit (8 bytes) not 96-bit despite the name

  • Some blocks have NoNameHash flag, omitting name hashes entirely

  • Uses Bob Jenkins’ lookup3.c algorithm (hashlittle2 function)

  • Processes data in 12-byte chunks with little-endian byte order

  • The 0xDEADBEEF constant is added during initialization

  • Python validation tool available in cascette-py project: https://github.com/wowemulation-dev/cascette-py

Example Hashes:

  • Empty string: 0xDEADBEEFDEADBEEF

  • Interface\Icons\INV_Misc_QuestionMark.blp: 0x9EB59E3C76124837

Implementation Example

#![allow(unused)]
fn main() {
struct RootFile {
    header: RootHeader,
    blocks: Vec<RootBlock>,
}

impl RootFile {
    pub fn find_file(&self, file_data_id: u32) -> Option<MD5Hash> {
        for block in &self.blocks {
            // Check if block matches desired flags
            if !self.matches_flags(block) {
                continue;
            }

            // Search for FileDataID
            if let Some(idx) = block.find_file_index(file_data_id) {
                return Some(block.records[idx].content_key);
            }
        }
        None
    }
}
}

Version History

  • Build 18125 (6.0.1): Initial CASC Root format (V1)

    • No magic header
    • 12-byte block header: num_records, content_flags, locale_flags
    • Interleaved record format: (ckey, name_hash) per record
  • Build 30080 (8.2.0): Added MFST magic signature (V2)

    • MFST/TSFM magic header with file counts
    • 17-byte block header: num_records, locale_flags, content_flags_1, content_flags_2, content_flags_3
    • Field order changed: locale_flags moved before content_flags
    • Combined content flags: content_flags_1 | content_flags_2 | (content_flags_3 << 17)
    • Separated array format: all ckeys, then all name_hashes
  • Build 50893 (10.1.7): Added header_size/version fields (V3)

    • Extended header with header_size, version, padding fields
    • Same 17-byte block header format as V2
  • Build 58221 (11.1.0): Extended content flags to 40 bits (V4)

    • 18-byte block header (content_flags grows from 4 to 5 bytes)
    • 40-bit content flags stored as u32 + u8

Version Detection Code

#![allow(unused)]
fn main() {
fn detect_root_version(data: &[u8]) -> RootVersion {
    if data.len() < 4 {
        return RootVersion::Invalid;
    }

    // Check for MFST or TSFM magic
    let magic = &data[0..4];
    if magic != b"MFST" && magic != b"TSFM" {
        return RootVersion::V1; // Pre-30080, no magic
    }

    // Read the two u32 values after magic
    let value1 = u32::from_le_bytes(data[4..8].try_into().unwrap());
    let value2 = u32::from_le_bytes(data[8..12].try_into().unwrap());

    // Heuristic: header_size in [16, 100) and version < 10
    // indicates v3+ with explicit header_size/version fields
    if (16..100).contains(&value1) && value2 < 10 {
        match value2 {
            4.. => RootVersion::V4,
            _ => RootVersion::V3, // version 1-3 all use V2/V3 block format
        }
    } else {
        RootVersion::V2 // 30080+, value1 is total_file_count
    }
}
}

Parser Implementation Status

The Python parser (cascette-py) currently supports:

  • Version detection (MFST/TSFM magic)

  • Version 1-3 parsing

  • Block-based extraction

  • Content key retrieval

  • Delta encoding detection (identifies but doesn’t decode)

The parser can extract FileDataID to content key mappings from all current WoW root file versions.

See https://github.com/wowemulation-dev/cascette-py for the Python implementation.

Common Issues

  1. V2 block header size: V2+ uses a 17-byte block header, not 12 bytes like V1. Using the wrong header size causes all subsequent parsing to fail with garbage FileDataIDs and content keys.

  2. V2 field order change: V2+ swapped locale_flags and content_flags positions. In V1: num_records, content_flags, locale_flags. In V2+: num_records, locale_flags, content_flags, unk2, unk3.

  3. Multiple matches: Same file may exist in multiple blocks with different locales

  4. Missing entries: Not all FileDataIDs have corresponding entries

  5. Flag interpretation: Game-specific flag meanings vary

  6. Delta overflow: Large gaps in FileDataIDs can cause integer overflow

Implementation Notes

Version Detection Heuristic

The version detection uses value2 < 10 to identify extended headers, which is broader than the strict matches!(value2, 1..=4) check. Version 1 is accepted and maps to V2 block format (17-byte header, locale_flags first). This matches CascLib and TACTSharp behavior. The heuristic may need tightening if future versions use values in the 5-9 range for non-version purposes.

Block Header Dispatch

The current dispatch is verified correct:

  • Plain V1 files (no MFST/TSFM magic) use the 12-byte header (content_flags first)
  • All MFST/TSFM files (including Classic Era) use the 17-byte header (locale_flags first)
  • V4 files use the 18-byte header (40-bit content flags)

The V2 17-byte format applies to all MFST/TSFM files regardless of the header version field value. The 12-byte format is only used for pre-magic V1 files.

References