Archive Files and Indices
CASC/TACT archives are container files that store game content in a packed format. They work with index files to enable efficient content retrieval without unpacking entire archives. The system uses different formats for network (TACT) and local storage (CASC).
Overview
The archive system provides:
-
Bulk storage of game assets in
.archivefiles -
Index files for fast content location
-
Support for partial downloads via HTTP range requests
-
Deduplication through content addressing
Archive Files
CDN Archives vs Local Archives
CDN Archives (TACT - served over HTTP):
-
Named using 32-character hash keys (e.g.,
86b6b0daf3d8ef68271b15567c37300c) -
Accessed via URL path:
/tpr/wow/data/{hash[:2]}/{hash[2:4]}/{hash} -
Paired with Archive Index files (
.index) for content location -
Single BLTE-encoded container format
-
Part of TACT (Tooling for Archive Content Transfer) protocol
Local Client Archives (CASC - stored on disk):
-
Named with numeric indices:
data.001,data.002, etc. -
Use IDX Journal files (
.idx) for local content access -
Multiple BLTE files concatenated together
-
Part of CASC (Content Addressable Storage Container) system
-
Optimized for memory-mapped access
CDN Archive Structure
CDN archives are single BLTE-encoded containers, while local archives contain multiple BLTE files:
CDN Archive Format (TACT): Local Archive Format (CASC):
┌──────────────────┐ ┌──────────────────┐
│ BLTE Container │ │ BLTE File 1 │
├──────────────────┤ ├──────────────────┤
│ Header & Blocks │ │ BLTE File 2 │
├──────────────────┤ ├──────────────────┤
│ Content Blocks │ │ BLTE File 3 │
│ (concatenated) │ │ ... │
└──────────────────┘ └──────────────────┘
Verified Archive Characteristics
Based on examination of sample archives:
-
File sizes: Range from ~7MB to 268MB when compressed
-
Compression ratios: 4.9x to 190x compression achieved via BLTE
-
Content types: WDB Cache files (WDC3), textures, models, and other game assets
-
Decompressed content: Much smaller than archive size (1-2MB typical)
-
Access pattern: Content addressed via hash keys in index files
CRITICAL: Two Completely Different Index Systems
⚠️ CDN Archive Index (.index) vs Local Storage Index (.idx)
NEVER CONFUSE THESE TWO FORMATS - THEY ARE COMPLETELY DIFFERENT:
- CDN Archive Index Files (.index): TACT format with 28-byte footer, variable-length encoding keys
- Local Storage Index Files (.idx): CASC format with header, fixed 9-byte content key buckets
These systems serve different purposes and use entirely different formats, key types, and data structures.
CDN Archive Index Format (TACT Protocol)
File Extension: .index
Location: Downloaded from CDN
Purpose: Maps variable-length encoding keys to CDN archive locations
Key Type: Encoding keys (from Encoding file)
Key Length: Variable, as specified in footer’s ekey_length field
(typically 16 bytes, sometimes 9)
Implementation: cascette-formats/src/archive/index.rs
Archive Index Files (.index) - TACT Protocol
Based on analysis of actual CDN index files from various WoW builds.
CDN archive indexes use a chunk-based format with footer metadata:
Archive Index Structure
Index File Layout:
┌────────────────┐
│ Data Chunks │ <- 4KB chunks containing entries
│ (4096 bytes) │
├────────────────┤
│ ... │
├────────────────┤
│ Last Chunk │ <- Table of contents + entries
├────────────────┤
│ Footer │ <- Metadata (variable length)
└────────────────┘
CDN Index Entry Format (Variable Length)
struct CDNArchiveIndexEntry {
uint8_t ekey[ekey_length]; // Encoding key (variable length from footer)
uint32_t encoded_size; // BLTE encoded size (big-endian)
uint32_t archive_offset; // Offset in archive (big-endian)
};
Entry Size: Variable = ekey_length + size_bytes + offset_bytes (from footer)
Typical Sizes:
- With 16-byte keys:
16 + 4 + 4 = 24 bytesper entry - With 9-byte keys:
9 + 4 + 4 = 17 bytesper entry
Key Properties:
- Encoding key length specified in footer’s
ekey_lengthfield - All multi-byte fields use big-endian encoding
- NEVER assume fixed 9-byte keys - always read from footer
Archive Index Footer (TACT)
Archive Index files use a 28-byte footer at the end of the file:
struct ArchiveIndexFooter { // 28 bytes total
uint8_t toc_hash[8]; // MD5(toc_keys || block_hashes)[:footer_hash_bytes]
uint8_t version; // Must be 0 or 1
uint8_t reserved[2]; // Must be [0, 0]
uint8_t page_size_kb; // Must be 4 (4KB pages)
uint8_t offset_bytes; // Archive offset field size (4, 5, or 6)
uint8_t size_bytes; // Compressed size field size (always 4)
uint8_t ekey_length; // EKey length in bytes (16 for full MD5)
uint8_t footer_hash_bytes; // Footer hash length (always 8)
uint32_t element_count; // Number of entries (little-endian - special case!)
uint8_t footer_hash[8]; // MD5 footer validation (first 8 bytes)
};
Verified Footer Properties:
-
Standard values: offset_bytes=4, size_bytes=4, ekey_length=16 (1-16 valid)
-
offset_bytes can be 4 (regular archives), 5 (archives >4GB), or 6 (archive-groups: 2-byte archive index + 4-byte offset)
-
Page/chunk size consistently 4096 bytes
-
Item length consistently 24 bytes (0x18)
-
Archive filename = MD5 hash of the footer
-
Footer validation uses MD5 hashing (first 8 bytes of hash)
-
Mixed endianness: element_count field is little-endian while all other
multi-byte fields are big-endian
-
TOC hash field is present but not validated in practice. No known reference implementation (CascLib, TACT.Net, rustycasc) validates this field. Testing against real files shows the stored values do not match any standard hash algorithm applied to the TOC data
Implementation Notes:
-
Extended Block Offsets: The agent logs “Archive w/ Extended Block Offset Found” for archive index entries that use larger-than-4-byte offsets (for archives exceeding 4GB)
-
Archive Count Limit: The agent has a
casc_supports_1023_archivesconfiguration flag, indicating a maximum of 1023 archives per CASC storage
Sample Analysis Results
File Sizes Observed:
-
Small indexes: ~8KB (few hundred entries)
-
Medium indexes: ~50-200KB (thousands of entries)
-
Large indexes: ~300KB+ (tens of thousands of entries)
Index Distribution (from sample builds):
-
WoW retail: 400-1400+ archives per build
-
WoW Classic: 1000-1400+ archives per build
-
Beta builds: 400-800 archives per build
Chunk Structure:
-
All indexes use 4KB chunks. Max entries per chunk =
4096 / (ekey_length + offset_bytes + size_bytes). With default 16+4+4 fields: 170 entries per chunk. -
Table of contents (TOC) is stored after data chunks and contains two sections:
- Last encoding key of each data chunk (for binary search)
- Per-block MD5 hash of each data chunk (truncated to
footer_hash_bytes)
-
TOC hash =
MD5(toc_keys || block_hashes)[:footer_hash_bytes] -
Chunk structure enables streaming and memory-efficient processing
-
Chunks are padded with zeros to maintain 4KB alignment
Archive Index Access Pattern
CDN URL Format:
https://cdn.domain.com/tpr/wow/data/{hash[:2]}/{hash[2:4]}/{hash}.index
Lookup Process:
- Get archive content key from CDN configuration
- Append ‘.index’ to form index URL
- Fetch and parse index file
- Search entries for target EKey
- Use offset/size to retrieve from corresponding .archive file
Self-Referential Naming:
The archive index filename (hash) is the MD5 of its own footer structure, providing a unique identifier that validates the index contents.
Local Storage Index Format (.idx files)
File Extension: .idx
Location: Client-side storage directory (Data/data/)
Purpose: Maps content keys to local data file locations using bucket algorithm
Key Type: Content keys (MD5 hashes from Root file)
Key Length: ALWAYS 9 bytes (truncated for space efficiency in local storage)
Implementation: cascette-client-storage/src/index.rs
See the comparison table at the end of this document for a full side-by-side comparison.
IDX Journal Files (.idx) - CASC Local Storage
Local CASC storage uses IDX Journal files for indexing:
IDX Journal Structure
struct IDXJournalHeader { // 18 bytes + block table
uint32_t data_size; // Size of header data
uint32_t data_hash; // Jenkins hash validation
uint16_t version; // Journal version
uint8_t bucket; // Bucket ID (0x00-0xFF)
uint8_t unused; // Padding
uint8_t length_size; // Size field bytes
uint8_t location_size; // Location field bytes (5 = 1 archive + 4 offset)
uint8_t key_size; // Key field bytes (9 or 16)
uint8_t segment_bits; // Segment size bits
// Followed by block table entries
};
Key Differences from Archive Indexes:
-
Bucket-based structure (256 buckets, 00-FF)
-
Jenkins hash validation instead of footer hash
-
Fixed key sizes (not truncated)
-
Header at start instead of footer at end
-
One journal file per bucket
Loose Files Index
For files not in archives:
struct LooseFilesIndex {
uint32_t magic; // 'LIDX'
uint32_t version;
uint32_t entry_count;
struct Entry {
uint8_t encoding_key[16];
uint32_t file_size;
uint8_t file_hash[16]; // For verification
} entries[];
};
Archive Lookup Process
- Get encoding key: From encoding file lookup
- Check indices: Search all index files for key
- Locate in archive: Extract offset and size
- Retrieve data: Read from archive at offset
- Decompress: Process BLTE container
Implementation Example
#![allow(unused)]
fn main() {
struct ArchiveIndex {
header: ArchiveIndexHeader,
entries: Vec<ArchiveIndexEntry>,
}
impl ArchiveIndex {
pub fn find_file(&self, encoding_key: &[u8]) -> Option<(u64, u32)> {
// Truncate search key to index key size
let search_key = &encoding_key[..self.header.key_size as usize];
// Binary search entries (sorted by key)
let idx = self.entries.binary_search_by_key(
&search_key,
|e| &e.key[..]
).ok()?;
let entry = &self.entries[idx];
Some((entry.offset, entry.size))
}
}
}
HTTP Range Requests
For CDN retrieval without downloading entire archives:
GET /data/5e/16/5e16b6ff530b1816c7b32296e0875ed4 HTTP/1.1
Host: cdn.example.com
Range: bytes=1048576-2097151
Response:
HTTP/1.1 206 Partial Content
Content-Range: bytes 1048576-2097151/134217728
Content-Length: 1048576
Archive Creation
When building archives:
- Group related files: Minimize seeks during loading
- Align boundaries: 4KB alignment for efficient I/O
- Order by access: Frequently accessed files first
- Compress individually: Each file is BLTE-encoded
- Update indices: Generate index entries
Optimization Strategies
Memory Mapping
For local archives:
#![allow(unused)]
fn main() {
use memmap2::MmapOptions;
struct ArchiveReader {
mmap: Mmap,
}
impl ArchiveReader {
pub fn read_file(&self, offset: u64, size: u32) -> &[u8] {
let start = offset as usize;
let end = start + size as usize;
&self.mmap[start..end]
}
}
}
Index Caching
Keep frequently used indices in memory:
#![allow(unused)]
fn main() {
struct IndexCache {
indices: HashMap<String, Arc<ArchiveIndex>>,
lru: LruCache<String, ()>,
}
}
Archive Validation
Checksum Verification
When checksums are present:
#![allow(unused)]
fn main() {
fn verify_file(data: &[u8], expected_checksum: &[u8; 16]) -> bool {
let computed = md5::compute(data);
computed.0 == *expected_checksum
}
}
Size Validation
Always verify extracted size matches expected:
#![allow(unused)]
fn main() {
if decompressed.len() != expected_size as usize {
return Err("Size mismatch");
}
}
Common Issues
- Key collisions: Truncated keys may collide (handle gracefully)
- Archive corruption: Verify checksums when available
- Missing indices: Some files may only exist as loose files
- Version mismatches: Handle different index versions
- Alignment padding: Account for alignment bytes
Archive Groups
Archive Groups are client-generated mega-indices that combine multiple CDN archive
indices into a single lookup structure, reducing search time from scanning hundreds
of individual .index files to a single binary search. They use 6-byte offset
fields (2-byte archive index + 4-byte offset) and are identified by archive-group
and patch-archive-group fields in CDN config.
See Archive-Groups for the full format specification.
File Organization
Typical CASC repository structure:
data/
├── config/ # Configuration files
├── data/ # Archive files
│ ├── 00/
│ │ ├── 00/{hash}.archive
│ │ └── ...
│ └── ff/
│ └── ff/{hash}.archive
├── indices/ # Index files
│ ├── {hash}.index
│ └── ...
└── patch/ # Patch archives
Version History
CDN Archive Index Format (.index files)
The CDN Archive Index format currently has only one version:
Version 1 (Current)
- Footer Size: 28 bytes
- Location: End of file
- Features:
- Variable-length encoding keys (footer’s
ekey_lengthfield) - 4KB chunk-based structure with table of contents
- MD5 hash validation (footer hash and TOC hash)
- Self-referential naming (filename = MD5 of footer)
- Mixed endianness (element_count is little-endian, others big-endian)
- Typical entry size: 24 bytes (16-byte key + 4-byte size + 4-byte offset)
- Variable-length encoding keys (footer’s
Version Detection
The version field is at offset 8 in the 28-byte footer. All known CDN archive indices use version 1.
Implementation Status
- cascette-formats: Full support for version 1 with parser
- Archive-groups: Client-side mega-indices combine multiple CDN indices (6-byte offset variant)
Local Storage Index Format (.idx files)
The Local Storage Index (IDX Journal) format currently has only one version:
Version 7 (Current - IDX Journal v7)
- Header Size: 16 bytes
- Location: Start of file
- Features:
- Fixed 9-byte truncated content keys (space optimization)
- 18-byte entries (9-byte key + 5-byte location + 4-byte size)
- 256 bucket-based organization (0x00-0xFF)
- Packed 5-byte location field (10-bit archive ID + 30-bit offset)
- Jenkins hash validation
- Mixed endianness (header little-endian, entries mixed)
- Bucket algorithm: XOR first 9 bytes, then XOR nibbles
- Filename format:
{bucket:02x}{version:06x}.idx
Version Detection
The version field is at offset 8 in the header (16-bit little-endian). The implementation validates version equals 7 and warns on unexpected versions.
Implementation Status
- cascette-client-storage: Full support for version 7 with parser and builder
- No earlier versions documented (version 7 is standard for modern CASC)
Key Differences Between Index Systems
| Feature | CDN Index (.index) | Local Index (.idx) |
|---|---|---|
| Version | 1 (footer-based) | 7 (header-based) |
| Protocol | TACT (network) | CASC (local storage) |
| Key Type | Encoding keys | Content keys |
| Key Length | Variable (16 typical) | Fixed 9-byte truncated |
| Structure | Sequential chunks | Bucket algorithm |
| Validation | MD5 hash | Jenkins hash |
| Endianness | Mixed (mostly big) | Mixed (header little) |
| Entry Size | Variable (24 typical) | Fixed 18 bytes |
| Location | CDN download | Client Data/ directory |
| Crate | cascette-formats | cascette-client-storage |
References
-
See Encoding Documentation for key lookup
-
See BLTE Format for archive content structure
-
See CDN Architecture for remote retrieval
-
See Format Transitions for format evolution tracking