CDN Mirroring and Archival Strategy

Overview

This document outlines strategies for mirroring Blizzard’s CDN content for WoW using NGDP/CASC.

Note: Python code examples in this document are conceptual pseudocode illustrating mirroring workflows. For working code, see the cascette mirror CLI command or reference implementations in References.

Rationale for Mirroring

Blizzard removes older builds from CDN within days to weeks of new patches (see Archival Urgency below). Mirroring preserves builds that would otherwise be lost, enabling:

Preservation: Maintain access to historical builds after CDN removal
Development: Test CASC implementations against known data offline
Performance: Local access avoids CDN latency and bandwidth limits

Target Products

Focus on World of Warcraft products:

Product Code	Description	Update Frequency
wow	Retail/Live	Weekly patches
wowt	Public Test Realm	Frequent updates
wow_beta	Beta servers	Daily during beta
wow_classic	Classic (Wrath/Cata)	Bi-weekly
wow_classic_era	Classic Era (Vanilla)	Rare updates
wow_classic_ptr	Classic PTR	During test cycles
wow_classic_titan	Classic Titan (CN only, WotLK 3.80.x)	Unknown
wow_anniversary	Classic Anniversary (TBC 2.5.x)	Unknown

Archival Urgency

Based on testing CDN retention windows:

Product	Retention Window	Archival Priority
wow (Retail)	14-15 days	High - Daily checks
wow_classic	2-4 weeks	Medium - Weekly checks
wow_classic_era	~3 months	Low - Monthly checks
wow_beta	7-10 days	Critical - Continuous
wowt (PTR)	10-14 days	High - Every 2-3 days

Critical Finding: Retail builds disappear within 2 weeks of new patches.

Build Discovery

Track new builds via Ribbit protocol:

Sequence Number Monitoring

# Query summary endpoint
echo -e "v1/summary\r\n" | nc us.version.battle.net 1119

# Response includes sequence numbers
## seqn = 2241282

Monitor sequence number changes:

async def check_for_updates():
    summary = await ribbit_client.get_summary()

    for product in summary.products:
        stored_seqn = database.get_sequence(product.name)

        if product.seqn > stored_seqn:
            # New build detected!
            await process_new_build(product)
            database.update_sequence(product.name, product.seqn)

Version Information

# Get specific product versions
echo -e "v1/products/wow/versions\r\n" | nc us.version.battle.net 1119

CDN Path Discovery

Critical: Always Extract CDN Paths

# Get CDN information - NEVER hardcode paths!
echo -e "v1/products/wow/cdns\r\n" | nc us.version.battle.net 1119

Example response:

Region!STRING:0|Hosts!STRING:0|Path!STRING:0|ConfigPath!STRING:0
us|level3.blizzard.com edgecast.blizzard.com|tpr/wow|tpr/configs/data
eu|level3.blizzard.com edgecast.blizzard.com|tpr/wow|tpr/configs/data

CRITICAL: The Path field (tpr/wow) must be used for URL construction:

# CORRECT - Uses path from CDN response
cdn_url = f"http://{host}/{path}/data/{hash[:2]}/{hash[2:4]}/{hash}"

# WRONG - Hardcoded path
cdn_url = f"http://{host}/tpr/wow/data/{hash[:2]}/{hash[2:4]}/{hash}"

All WoW products use tpr/wow regardless of product code:

wow, wow_classic, wow_classic_era, wow_classic_titan, wow_anniversary all use tpr/wow
Never assume paths based on product names

Essential Files

Priority order for archival:

1. Configuration Files (Critical)

BuildConfig: Build-specific settings
CDNConfig: CDN and archive information
ProductConfig: Product metadata

2. System Files (Required)

Encoding: Content key mappings (~500MB-2GB)
Root: File manifest
Install: Installation manifest
Download: Download priority

3. Indices (Important)

Archive indices (.index files)
Patch indices for updates

4. Data Archives (Bulk)

Archive files (data.###)
Largest storage requirement
Can be fetched on-demand

Mirroring Architecture

Storage Structure

/mirror
├── configs/
│   └── data/
│       ├── {hash[0:2]}/
│       │   └── {hash[2:4]}/
│       │       └── {hash}
├── data/
│   ├── {hash[0:2]}/
│   │   └── {hash[2:4]}/
│   │       └── {hash}
├── indices/
│   └── *.index
└── metadata.db

Database Schema

CREATE TABLE builds (
    id SERIAL PRIMARY KEY,
    product VARCHAR(50),
    build_config VARCHAR(32),
    cdn_config VARCHAR(32),
    build_name VARCHAR(100),
    detected_at TIMESTAMP,
    archived BOOLEAN DEFAULT FALSE
);

CREATE TABLE files (
    hash VARCHAR(32) PRIMARY KEY,
    size BIGINT,
    type VARCHAR(20),
    downloaded_at TIMESTAMP
);

Download Strategy

Priority-Based Downloading

class MirrorStrategy:
    def __init__(self):
        self.priorities = {
            'configs': 1,      # Highest priority
            'encoding': 2,
            'root': 3,
            'install': 4,
            'indices': 5,
            'data': 10        # Lowest priority
        }

    async def mirror_build(self, build_info):
        # 1. Download configs first
        await self.download_configs(build_info)

        # 2. Get encoding file
        encoding = await self.download_encoding(build_info)

        # 3. Download indices
        indices = await self.download_indices(build_info)

        # 4. Optional: Download data archives
        if self.full_mirror:
            await self.download_archives(indices)

Bandwidth Management

Concurrent downloads: 4-8 connections
Rate limiting: Respect CDN limits
Retry logic: Handle transient failures
Resume support: Continue interrupted downloads

Incremental Updates

Track changes efficiently:

async def incremental_update(product):
    current_build = await get_current_build(product)
    stored_build = database.get_latest_build(product)

    if current_build != stored_build:
        # Download only new/changed files
        new_files = await diff_builds(current_build, stored_build)
        await download_files(new_files)

        database.update_build(product, current_build)

Verification

Ensure data integrity:

Hash Verification

def verify_file(filepath, expected_hash):
    actual_hash = calculate_md5(filepath)
    if actual_hash != expected_hash:
        raise IntegrityError(f"Hash mismatch: {filepath}")

Archive Integrity

Verify BLTE headers
Check chunk checksums
Validate encoding entries

Storage Optimization

Deduplication

Content-addressed storage automatically deduplicates:

def store_file(content, hash):
    path = get_path_from_hash(hash)
    if not os.path.exists(path):
        # Only store if not already present
        write_file(path, content)

Compression

Keep BLTE files compressed
Use filesystem compression for configs
Consider archive formats for old builds

Historical Build Recovery

Using External Sources

Community Archives:
- Shared build collections
- Private archives
Wayback Machine:
- Historical Ribbit responses
- Cached configuration files
Torrent archives:
- Community-shared build collections
- Distributed preservation efforts

Reconstruction

Rebuild missing content:

flowchart TD
    A[Partial Build] --> B[Identify Missing]
    B --> C[Search Mirrors]

    C --> D{Found?}
    D -->|Yes| E[Download Missing]
    D -->|No| F[Check Archives]

    F --> G{In Archive?}
    G -->|Yes| H[Extract Content]
    G -->|No| I[Search Community]

    E --> J[Verify Hashes]
    H --> J

    I --> K{Available?}
    K -->|Yes| L[Request Copy]
    K -->|No| M[Document Gap]

    L --> J
    J --> N[Update Archive]
    M --> O[Gap Report]

    style A stroke-width:4px
    style N stroke-width:4px
    style O stroke-width:3px,stroke-dasharray:5 5
    style D stroke-width:3px,stroke-dasharray:5 5
    style G stroke-width:3px,stroke-dasharray:5 5
    style K stroke-width:3px,stroke-dasharray:5 5
    style J stroke-width:2px
    style B stroke-width:2px

Legal Considerations

Fair Use

Archival under fair use principles:

Research: Academic study of game development
Education: Teaching game architecture
Preservation: Cultural heritage of gaming
Non-commercial: No monetization of archives

Best Practices

Respect intellectual property
Don’t distribute copyrighted content
Use for personal/research purposes
Cooperate with takedown requests

Reference Implementations

For detailed analysis of NGDP/CASC reference implementations, see references.md.

Key implementations examined:

CascLib: Complete C++ library with 10+ years of development
TACT.Net: C# architecture with modular design
rustycasc: Rust implementation with type safety
BlizzTrack: Production monitoring with database persistence
blizztools: Rust CLI for NGDP operations
blizzget: C++ downloader with custom version support
tactmon: Advanced C++ monitoring with template ORM
TACTSharp: .NET extraction library with memory-mapped files

These implementations informed cascette-rs design for CDN interaction and content resolution.

Implementation Examples

Build Tracker

class BuildTracker:
    def __init__(self, products):
        self.products = products
        self.check_interval = 300  # 5 minutes

    async def run(self):
        while True:
            for product in self.products:
                await self.check_product(product)
            await asyncio.sleep(self.check_interval)

    async def check_product(self, product):
        try:
            versions = await ribbit.get_versions(product)
            cdns = await ribbit.get_cdns(product)

            for region in versions.regions:
                build_config = region.build_config
                if not self.is_archived(build_config):
                    await self.archive_build(product, region, cdns)
        except Exception as e:
            logger.error(f"Failed to check {product}: {e}")

Archive Manager

class ArchiveManager:
    def __init__(self, storage_path):
        self.storage = storage_path
        self.cdn_client = CDNClient()

    async def archive_build(self, build_info):
        # Create build directory
        build_dir = self.storage / build_info.product / build_info.build_config
        build_dir.mkdir(parents=True, exist_ok=True)

        # Download in priority order
        await self.download_configs(build_info)
        await self.download_encoding(build_info)
        await self.download_root(build_info)

        # Mark as archived
        self.mark_archived(build_info)

Monitoring and Alerts

Health Checks

class MirrorHealth:
    async def check_health(self):
        return {
            'disk_space': self.check_disk_space(),
            'cdn_connectivity': await self.check_cdn(),
            'database': self.check_database(),
            'last_check': datetime.now()
        }

    def check_disk_space(self):
        usage = shutil.disk_usage(self.storage_path)
        return {
            'used': usage.used,
            'free': usage.free,
            'percent': (usage.used / usage.total) * 100
        }

Disaster Recovery

Backup Strategy

Primary Mirror: Fast SSD storage
Secondary Backup: HDD archive
Cloud Backup: Critical configs only
Community Sharing: Torrent distribution

Recovery Procedures

# Restore from backup
rsync -av /backup/mirror/ /primary/mirror/

# Verify integrity
find /mirror -type f -name "*.index" | xargs -I {} md5sum {}

# Rebuild database
python rebuild_metadata.py /mirror

Community Coordination

Shared Resources

Mirror status: Track who has what builds
Gap identification: Find missing builds
Bandwidth sharing: Distribute download load
Verification: Cross-check integrity

Future Considerations

Automated build discovery with predictive downloading before CDN removal
Differential compression between builds to reduce storage
Geographic replication for redundancy

Tools and Resources

Existing Tools

CASCExplorer: Browse CASC archives
WoW.tools: Online CASC viewer
TACTSharp: .NET extraction library
CascLib: C++ CASC library

Monitoring Services

BlizzTrack: Real-time build tracking
Wago.tools: API for build information

Community

Discord servers: Coordinate archival efforts
GitHub repos: Share tools and scripts
Forums: Technical discussions

The 14-15 day retention window for retail WoW makes automated monitoring and archival essential.

Keyboard shortcuts

cascette-rs