Game Dev Testing & QA: From 'It Compiles' to 'It Ships'

This is Part 4 of the “Build Your Game Dev Pipeline” series. Part 1: Task Management → Part 2: Perforce + Jenkins Triggers → Part 3: Build Configuration → Part 4: Testing & QA → Part 5: Deployment.

The Green Checkmark of Lies

The Jenkins dashboard is a wall of blue. Every build: success. The producer opens Slack: “Milestone build is ready for the publisher.”

QA opens it. Main menu runs at 12 FPS. Someone checked in uncompressed 8K textures for every UI element. The tutorial crashes two minutes in — a nullptr in a quest system that worked fine on the programmer’s machine because they had a save file that skipped the intro. And the PS5 build? Hard lock on the loading screen. A shader permutation that only manifests on AMD GPUs.

Every one of these builds compiled. Every one passed Jenkins. Every one was broken.

“Build succeeded” is the most dangerous lie in game development. It means the compiler didn’t throw an error. It says nothing about whether the game actually works. The gap between “it compiles” and “it ships” is where QA lives — and most studios try to fill that gap with manual testing alone.

Manual testing catches problems. Automated testing prevents them. This post is about building the second half of that equation into the pipeline you’ve been constructing since Part 1.

Why Game Dev Testing Is Different

If you’ve done testing in web dev, forget half of what you know. Game testing has fundamentally different constraints.

Dimension	Web/Enterprise	Game Dev
Determinism	Same input → same output	Physics, AI, animation blending = non-deterministic
Platform matrix	2-3 browsers	Win64, PS5, Xbox Series X, Switch, maybe mobile
Content as code	Assets are static files	A texture change can break performance budgets
Performance	“Fast enough”	16.67ms per frame or players feel it
Output validation	Assert on data	“Does this look right?” is a real test case
Build time	Seconds to minutes	Minutes to hours

The biggest shift: performance is correctness. A web app that’s 200ms slow is annoying. A game that drops below 30 FPS is broken. This means performance testing isn’t optional — it’s a first-class citizen alongside functional tests.

The Testing Pyramid for Games

The classic testing pyramid still applies, but with game-specific layers:

              ┌─────────────┐
              │   Manual QA  │  ← Game feel, art direction, "is it fun?"
             ┌┤             ├┐
            ┌┤│  Smoke/E2E  │├┐  ← Boot test, critical paths
           ┌┤│├─────────────┤│├┐
          ┌┤││ Integration  ││├┐  ← Systems interaction, asset loading
         ┌┤│││─────────────│││├┐
         │││││  Unit Tests  │││││  ← C++/C# module tests
         └┴┴┴┴─────────────┴┴┴┴┘

  ║ Performance ║  ← Parallel dimension (runs at every level)

Here’s the uncomfortable truth: most game studios have zero automated tests. None. If you add a single boot test — does the game launch and reach the main menu? — you’re already ahead of the majority of the industry.

Start there. Seriously. A boot test that runs on every build catches more issues than you’d expect: missing assets, broken cook, initialization crashes, shader compile failures. It’s the highest-ROI test you’ll ever write.

Recommended test cadence:

Test Type	When to Run	Target Duration
Unit tests	Every commit	< 5 min
Boot/smoke test	Every build	< 10 min
Integration tests	Nightly	< 30 min
Full smoke suite	Nightly	< 30 min
Performance tests	Nightly + release candidates	< 45 min
Full platform matrix	Release candidates only	< 2 hours

Adding Automated Tests to Your Jenkins Pipeline

Note: These examples show the general pattern — adapt paths, flags, and API calls to your engine version and project setup. They’re starting points, not copy-paste solutions.

In Part 3, we built a Jenkins pipeline that syncs, builds, and archives. Now we add test stages. Here’s the extended pipeline:

// Extending Part 3's pipeline with test stages
pipeline {
    agent { label 'build-machine' }

    parameters {
        string(name: 'CHANGELIST', description: 'Perforce changelist number')
        choice(name: 'BUILD_TYPE', choices: ['Development', 'Shipping', 'Test', 'Debug'],
               description: 'Build configuration')
        choice(name: 'PLATFORM', choices: ['Win64', 'PS5', 'XboxSeriesX'],
               description: 'Target platform')
    }

    stages {
        stage('Sync') {
            steps {
                p4sync(
                    credential: 'perforce-creds',
                    populate: syncOnly(force: false, have: true, quiet: true),
                    source: depotSource('//depot/GameProject/...')
                )
            }
        }

        stage('Build') {
            steps {
                // ... (same as Part 3)
            }
        }

        stage('Unit Tests') {
            steps {
                script {
                    sh """
                        ${env.UE_ROOT}/Engine/Build/BatchFiles/RunUAT.sh RunTests \
                        -project="\${WORKSPACE}/GameProject.uproject" \
                        -testfilter="Project.Unit" \
                        -reportexportpath="\${WORKSPACE}/TestResults/Unit" \
                        -log
                    """
                }
            }
            post {
                always {
                    junit 'TestResults/Unit/**/*.xml'
                }
            }
        }

        stage('Smoke Tests') {
            when {
                anyOf {
                    expression { params.BUILD_TYPE == 'Test' }
                    expression { params.BUILD_TYPE == 'Shipping' }
                    triggeredBy 'TimerTrigger' // Nightly builds
                }
            }
            steps {
                script {
                    sh """
                        ${env.UE_ROOT}/Engine/Build/BatchFiles/RunUAT.sh RunTests \
                        -project="\${WORKSPACE}/GameProject.uproject" \
                        -testfilter="Project.Smoke" \
                        -reportexportpath="\${WORKSPACE}/TestResults/Smoke" \
                        -nullrhi \
                        -log
                    """
                }
            }
            post {
                always {
                    junit 'TestResults/Smoke/**/*.xml'
                }
            }
        }

        stage('Performance Tests') {
            when {
                anyOf {
                    expression { params.BUILD_TYPE == 'Shipping' }
                    triggeredBy 'TimerTrigger'
                }
            }
            steps {
                script {
                    sh """
                        ${env.UE_ROOT}/Engine/Build/BatchFiles/RunUAT.sh RunTests \
                        -project="\${WORKSPACE}/GameProject.uproject" \
                        -testfilter="Project.Performance" \
                        -reportexportpath="\${WORKSPACE}/TestResults/Perf" \
                        -log
                    """
                }
            }
            post {
                always {
                    junit 'TestResults/Perf/**/*.xml'
                    archiveArtifacts artifacts: 'TestResults/Perf/**', allowEmptyArchive: true
                }
            }
        }

        stage('Archive') {
            steps {
                archiveArtifacts artifacts: 'Builds/**', fingerprint: true
            }
        }
    }

    post {
        failure {
            slackSend channel: '#builds',
                message: "Build FAILED: CL ${params.CHANGELIST} (${params.PLATFORM} ${params.BUILD_TYPE})"
        }
        success {
            slackSend channel: '#builds',
                message: "Build + Tests PASSED: CL ${params.CHANGELIST} (${params.PLATFORM} ${params.BUILD_TYPE})"
        }
    }
}

Notice the when blocks. Unit tests run on every build. Smoke tests only run for Test/Shipping builds or nightly triggers. Performance tests only run for Shipping builds or nightlies. This keeps commit builds fast while still catching regressions before they reach QA.

Unreal Engine: Gauntlet Tests

Unreal’s Gauntlet framework is the standard for automated functional testing. It launches the game, runs test scripts, and reports results. Here’s how to invoke it:

#!/bin/bash
# run_gauntlet_tests.sh — Boot test via Unreal Gauntlet

UE_ROOT="/opt/UnrealEngine"
PROJECT_PATH="${WORKSPACE}/GameProject.uproject"
PLATFORM="${1:-Win64}"
TEST_NAME="${2:-BootTest}"
RESULTS_DIR="${WORKSPACE}/TestResults/Gauntlet"

mkdir -p "${RESULTS_DIR}"

# RunGauntlet = launch game + run Gauntlet test scripts (full boot/functional tests)
# RunTests    = run automation framework tests (unit/smoke/perf, used in Jenkins stages above)
"${UE_ROOT}/Engine/Build/BatchFiles/RunUAT.sh" \
    RunGauntlet \
    -project="${PROJECT_PATH}" \
    -build=local \
    -run \
    -test="${TEST_NAME}" \
    -platform="${PLATFORM}" \
    -configuration=Test \
    -nullrhi \
    -unattended \
    -nosplash \
    -nosound \
    -ReportExportPath="${RESULTS_DIR}" \
    -log="${RESULTS_DIR}/${TEST_NAME}.log" \
    -timeout=600

EXIT_CODE=$?

# Parse results for Jenkins
if [ -f "${RESULTS_DIR}/${TEST_NAME}_Summary.json" ]; then
    echo "Test results exported to ${RESULTS_DIR}"
else
    echo "WARNING: No test summary found"
fi

exit ${EXIT_CODE}

The flags that matter: -nullrhi runs without a GPU (for headless CI machines), -unattended prevents dialog boxes, -timeout=600 kills stuck tests after 10 minutes. Remove -nullrhi if your tests need to validate rendering.

Unity: Test Runner CLI

Unity’s Test Framework runs from the command line. Here’s the invocation plus a sample PlayMode test:

#!/bin/bash
# run_unity_tests.sh — Unity Test Runner CLI

UNITY_PATH="/opt/Unity/Editor/Unity"
PROJECT_PATH="${WORKSPACE}/GameProject"
RESULTS_DIR="${WORKSPACE}/TestResults/Unity"

mkdir -p "${RESULTS_DIR}"

# Run EditMode tests (fast, no game launch)
"${UNITY_PATH}" -batchmode -nographics \
    -projectPath "${PROJECT_PATH}" \
    -runTests \
    -testPlatform EditMode \
    -testResults "${RESULTS_DIR}/editmode-results.xml" \
    -logFile "${RESULTS_DIR}/editmode.log"

# Run PlayMode tests (launches game, slower)
"${UNITY_PATH}" -batchmode -nographics \
    -projectPath "${PROJECT_PATH}" \
    -runTests \
    -testPlatform PlayMode \
    -testResults "${RESULTS_DIR}/playmode-results.xml" \
    -logFile "${RESULTS_DIR}/playmode.log"

// BootTest.cs — Unity PlayMode test that verifies the game launches
using System.Collections;
using NUnit.Framework;
using UnityEngine;
using UnityEngine.TestTools;
using UnityEngine.SceneManagement;

public class BootTest
{
    [UnityTest]
    public IEnumerator GameBootsToMainMenu()
    {
        // Load the boot scene
        SceneManager.LoadScene("Boot", LoadSceneMode.Single);
        yield return new WaitForSeconds(5f);

        // Verify we reached the main menu
        var mainMenu = GameObject.Find("MainMenuCanvas");
        Assert.IsNotNull(mainMenu, "Main menu did not appear after boot");
    }

    [UnityTest]
    public IEnumerator AllScenesLoadWithoutErrors()
    {
        var sceneCount = SceneManager.sceneCountInBuildSettings;
        for (int i = 0; i < sceneCount; i++)
        {
            var scenePath = SceneUtility.GetScenePathByBuildIndex(i);
            SceneManager.LoadScene(i, LoadSceneMode.Single);
            yield return new WaitForSeconds(2f);

            // Verify the correct scene actually loaded (not just any scene)
            Assert.AreEqual(scenePath, SceneManager.GetActiveScene().path,
                $"Expected scene {scenePath} but got {SceneManager.GetActiveScene().path}");
        }
    }
}

The boot test is your single highest-value automated test. Write this one first, run it on every build, and you’ll catch the majority of “the game doesn’t even launch” issues before QA wastes time on them.

Performance Budgets and Gates

A build that runs at 15 FPS is not a successful build. Performance budgets make this explicit: define hard limits, fail the build if they’re exceeded.

Define Your Budgets

Metric	PC (Recommended)	PS5	Switch
Frame time	< 16.67ms (60 FPS)	< 16.67ms (60 FPS)	< 33.33ms (30 FPS)
Memory	< 8 GB	< 12 GB (of 16 GB)	< 3.5 GB (of 4 GB)
Load time (initial)	< 15s	< 10s (SSD)	< 30s
Draw calls	< 3,000/frame	< 2,500/frame	< 1,500/frame
Texture memory	< 4 GB	< 6 GB	< 1.5 GB
Disk size (build)	< 80 GB	< 80 GB	< 16 GB

These are starting points. Adjust for your game’s genre and art style. An open-world RPG has different budgets than a puzzle game.

Performance Gate in Jenkins

stage('Performance Gate') {
    when {
        anyOf {
            expression { params.BUILD_TYPE == 'Shipping' }
            triggeredBy 'TimerTrigger'
        }
    }
    steps {
        script {
            // Run performance benchmark
            sh """
                ${env.UE_ROOT}/Engine/Build/BatchFiles/RunUAT.sh RunTests \
                -project="\${WORKSPACE}/GameProject.uproject" \
                -testfilter="Project.Performance.Benchmark" \
                -nullrhi=false \
                -gpu \
                -reportexportpath="\${WORKSPACE}/TestResults/Perf" \
                -log
            """

            // Parse results and enforce budgets
            // This file must be written by your test harness (see C++ example below)
            def perfResults = readJSON file: 'TestResults/Perf/perf_summary.json'

            def failures = []
            if (perfResults.avg_frame_time_ms > 16.67) {
                failures.add("Frame time ${perfResults.avg_frame_time_ms}ms exceeds 16.67ms budget")
            }
            if (perfResults.peak_memory_mb > 8192) {
                failures.add("Memory ${perfResults.peak_memory_mb}MB exceeds 8GB budget")
            }
            if (perfResults.avg_draw_calls > 3000) {
                failures.add("Draw calls ${perfResults.avg_draw_calls} exceeds 3000 budget")
            }

            if (failures.size() > 0) {
                error("Performance budget exceeded:\n${failures.join('\n')}")
            }

            echo "Performance gate PASSED: ${perfResults.avg_frame_time_ms}ms frame time, " +
                 "${perfResults.peak_memory_mb}MB memory, ${perfResults.avg_draw_calls} draw calls"
        }
    }
    post {
        always {
            // Archive for trend tracking
            archiveArtifacts artifacts: 'TestResults/Perf/**', allowEmptyArchive: true
        }
    }
}

Unreal Engine: Stat Capture Automation

// Pseudocode — adapt to your engine version and stat capture method
//
// PerfBenchmarkTest.cpp — UE5 automated performance capture (illustrative)
#include "Misc/AutomationTest.h"
#include "Engine/World.h"
#include "HAL/PlatformMemory.h"

IMPLEMENT_SIMPLE_AUTOMATION_TEST(FPerfBenchmarkTest,
    "Project.Performance.Benchmark",
    EAutomationTestFlags::ClientContext | EAutomationTestFlags::StressFilter)

bool FPerfBenchmarkTest::RunTest(const FString& Parameters)
{
    // Load the benchmark map
    UWorld* World = AutomationOpenMap(TEXT("/Game/Maps/PerfBenchmark"));
    TestNotNull(TEXT("Benchmark map loaded"), World);

    // --- What to capture ---
    // 1. Frame times: use FEngineLoop tick or stat unit, NOT FPlatformProcess::Sleep
    //    UE5's FAutomationTestBase provides AddCommand() for latent operations
    //    that tick the engine properly between frames
    // 2. Memory: FPlatformMemory::GetStats() for physical memory usage
    // 3. Draw calls: GNumDrawCallsRHI (RHI stat counter)
    //
    // General shape:
    //   - Warm up for ~60 frames (latent commands that tick the engine)
    //   - Sample ~300 frames of real engine ticks
    //   - Collect: frame time (ms), peak memory (MB), draw call count
    //   - Write results to JSON for Jenkins to parse

    // Calculate stats from collected samples
    // float AvgFrameTime = ...; float MaxFrameTime = ...; int32 AvgDrawCalls = ...;

    // Memory stats
    FPlatformMemoryStats MemStats = FPlatformMemory::GetStats();
    float UsedMemoryMB = MemStats.UsedPhysical / (1024.0f * 1024.0f);

    // Log results (parsed by Jenkins)
    UE_LOG(LogTemp, Display, TEXT("PERF_RESULT: avg_frame_time_ms=%.2f"), AvgFrameTime);
    UE_LOG(LogTemp, Display, TEXT("PERF_RESULT: max_frame_time_ms=%.2f"), MaxFrameTime);
    UE_LOG(LogTemp, Display, TEXT("PERF_RESULT: avg_draw_calls=%d"), AvgDrawCalls);
    UE_LOG(LogTemp, Display, TEXT("PERF_RESULT: memory_mb=%.0f"), UsedMemoryMB);

    // Assert budgets
    TestTrue(TEXT("Average frame time within budget"), AvgFrameTime < 16.67f);
    TestTrue(TEXT("No frame spikes above 33ms"), MaxFrameTime < 33.33f);
    TestTrue(TEXT("Draw calls within budget"), AvgDrawCalls < 3000);
    TestTrue(TEXT("Memory within budget"), UsedMemoryMB < 8192.0f);

    // Write results JSON for Jenkins performance gate
    // (your test harness must output this file for the gate to read)

    return true;
}

The exact API depends on your engine version. UE5’s FAutomationTestBase provides AddCommand() for latent operations that tick the engine properly. The pattern above shows the shape of a perf test — capture warmup, sample, calculate, assert.

Unity: Performance Testing Framework

// PerformanceBenchmark.cs — Unity Performance Testing Framework
using System.Collections;
using NUnit.Framework;
using Unity.PerformanceTesting;
using UnityEngine;
using UnityEngine.TestTools;
using UnityEngine.SceneManagement;
using UnityEngine.Profiling;

public class PerformanceBenchmark
{
    [UnityTest, Performance]
    public IEnumerator MainMenuFrameTime()
    {
        SceneManager.LoadScene("MainMenu", LoadSceneMode.Single);
        yield return new WaitForSeconds(3f); // Let scene settle

        // Measure frame time over 300 frames
        using (Measure.Frames()
            .WarmupCount(60)
            .MeasurementCount(300)
            .Scope())
        {
            yield return Measure.Frames().Run();
        }

        // Budget: 60 FPS = 16.67ms
        // The Performance Testing Framework records SampleGroups
        // that can be asserted against in the test results
    }

    [UnityTest, Performance]
    public IEnumerator GameplayMemoryBudget()
    {
        SceneManager.LoadScene("GameplayLevel01", LoadSceneMode.Single);
        yield return new WaitForSeconds(10f);

        long totalMemory = Profiler.GetTotalAllocatedMemoryLong();
        float memoryMB = totalMemory / (1024f * 1024f);

        Measure.Custom(new SampleGroup("TotalMemoryMB"), memoryMB);

        // Assert memory budget
        Assert.Less(memoryMB, 8192f,
            $"Memory usage {memoryMB:F0}MB exceeds 8GB budget");

        yield return null;
    }

    [UnityTest, Performance]
    public IEnumerator LoadTimeWithinBudget()
    {
        float startTime = Time.realtimeSinceStartup;

        SceneManager.LoadScene("GameplayLevel01", LoadSceneMode.Single);
        yield return new WaitUntil(() =>
            SceneManager.GetActiveScene().name == "GameplayLevel01" &&
            SceneManager.GetActiveScene().isLoaded);

        float loadTime = Time.realtimeSinceStartup - startTime;
        Measure.Custom(new SampleGroup("LoadTimeSeconds"), loadTime);

        Assert.Less(loadTime, 15f,
            $"Load time {loadTime:F1}s exceeds 15s budget");
    }
}

Track trends, not just pass/fail. A frame time of 15ms today is fine. But if it was 10ms last week and 12ms the week before, something is trending the wrong direction. Archive your performance results (the Jenkins pipeline above does this) and chart them over time. A simple spreadsheet works. Grafana is better.

Asset-level budgets matter too. A single 4K uncompressed texture shouldn’t blow your memory budget. We’ll cover this in the next section.

Asset Quality Gates

Assets cause as many build problems as code. An artist checks in a 200MB uncompressed TGA, and suddenly the build is 2GB larger and load times doubled. Automated validation catches this before it reaches the build.

Automated Content Validation

#!/usr/bin/env python3
"""
asset_validator.py — Validate game assets against project standards.
Run as a pre-commit check or Jenkins stage.
"""

import os
import sys
import json
from pathlib import Path
from dataclasses import dataclass, field
from typing import List

@dataclass
class ValidationResult:
    path: str
    rule: str
    severity: str  # "error" or "warning"
    message: str

@dataclass
class AssetValidator:
    project_root: Path
    results: List[ValidationResult] = field(default_factory=list)

    # --- Rules (adjust per project) ---
    TEXTURE_RULES = {
        "max_dimension": 4096,
        "must_be_power_of_two": True,
        "allowed_formats": [".png", ".tga", ".exr"],
        "max_file_size_mb": 50,
    }

    MESH_RULES = {
        "max_triangles": {
            "character": 100_000,
            "prop": 25_000,
            "environment": 50_000,
            "weapon": 30_000,
        },
        "require_lods": True,
        "min_lod_count": 2,
    }

    AUDIO_RULES = {
        "allowed_formats": [".wav", ".ogg"],
        "max_sample_rate": 48000,
        "max_file_size_mb": 100,
    }

    NAMING_RULES = {
        "textures": r"^T_[A-Z][a-zA-Z0-9]+_(D|N|R|M|E|AO)$",  # T_AssetName_D (diffuse)
        "meshes": r"^SM_[A-Z][a-zA-Z0-9]+$",                    # SM_AssetName
        "materials": r"^M_[A-Z][a-zA-Z0-9]+$",                   # M_AssetName
        "blueprints": r"^BP_[A-Z][a-zA-Z0-9]+$",                 # BP_AssetName
    }

    def validate_texture(self, filepath: Path):
        """Check texture dimensions, format, and file size."""
        ext = filepath.suffix.lower()
        if ext not in self.TEXTURE_RULES["allowed_formats"]:
            self.results.append(ValidationResult(
                str(filepath), "texture_format", "error",
                f"Invalid format {ext}. Allowed: {self.TEXTURE_RULES['allowed_formats']}"
            ))

        size_mb = filepath.stat().st_size / (1024 * 1024)
        if size_mb > self.TEXTURE_RULES["max_file_size_mb"]:
            self.results.append(ValidationResult(
                str(filepath), "texture_size", "error",
                f"Texture {size_mb:.1f}MB exceeds {self.TEXTURE_RULES['max_file_size_mb']}MB limit"
            ))

        # Dimension checks require PIL/Pillow
        try:
            from PIL import Image
            with Image.open(filepath) as img:
                w, h = img.size
                max_dim = self.TEXTURE_RULES["max_dimension"]
                if w > max_dim or h > max_dim:
                    self.results.append(ValidationResult(
                        str(filepath), "texture_dimension", "error",
                        f"Texture {w}x{h} exceeds {max_dim}x{max_dim} limit"
                    ))
                if self.TEXTURE_RULES["must_be_power_of_two"]:
                    if (w & (w - 1)) != 0 or (h & (h - 1)) != 0:
                        self.results.append(ValidationResult(
                            str(filepath), "texture_pot", "warning",
                            f"Texture {w}x{h} is not power-of-two"
                        ))
        except ImportError:
            pass  # Skip dimension check if PIL not available

    def validate_mesh(self, filepath: Path):
        """Check mesh file size. Poly count checks require trimesh or pyfbx."""
        size_mb = filepath.stat().st_size / (1024 * 1024)
        max_size_mb = 100  # Adjust per project
        if size_mb > max_size_mb:
            self.results.append(ValidationResult(
                str(filepath), "mesh_size", "error",
                f"Mesh {size_mb:.1f}MB exceeds {max_size_mb}MB limit"
            ))
        # TODO: For poly count validation, use trimesh or engine-specific tools:
        # import trimesh
        # mesh = trimesh.load(filepath)
        # if len(mesh.faces) > max_triangles: ...

    def validate_naming(self, filepath: Path):
        """Check asset naming conventions."""
        import re
        name = filepath.stem
        ext = filepath.suffix.lower()

        pattern = None
        if ext in [".png", ".tga", ".exr"]:
            pattern = self.NAMING_RULES.get("textures")
        elif ext in [".fbx", ".obj"]:
            pattern = self.NAMING_RULES.get("meshes")

        if pattern and not re.match(pattern, name):
            self.results.append(ValidationResult(
                str(filepath), "naming_convention", "warning",
                f"'{name}' doesn't match naming convention: {pattern}"
            ))

    def validate_directory(self, directory: Path, extensions: list):
        """Validate all matching assets in a directory."""
        self.total_scanned = 0
        for ext in extensions:
            for filepath in directory.rglob(f"*{ext}"):
                self.total_scanned += 1
                self.validate_naming(filepath)
                if ext in [".png", ".tga", ".exr"]:
                    self.validate_texture(filepath)
                elif ext in [".fbx", ".obj"]:
                    self.validate_mesh(filepath)

    def run(self):
        """Run all validations and return exit code."""
        content_dir = self.project_root / "Content"
        if content_dir.exists():
            self.validate_directory(content_dir,
                [".png", ".tga", ".exr", ".fbx", ".obj", ".wav", ".ogg"])

        # Report
        errors = [r for r in self.results if r.severity == "error"]
        warnings = [r for r in self.results if r.severity == "warning"]

        print(f"\n{'='*60}")
        print(f"Asset Validation: {len(errors)} errors, {len(warnings)} warnings")
        print(f"{'='*60}")

        for r in self.results:
            icon = "ERROR" if r.severity == "error" else "WARN"
            print(f"  [{icon}] {r.path}")
            print(f"         {r.message}")

        # Export JSON for Jenkins
        report = {
            "total_scanned": getattr(self, 'total_scanned', 0),
            "assets_with_findings": len(set(r.path for r in self.results)),
            "errors": len(errors),
            "warnings": len(warnings),
            "details": [vars(r) for r in self.results],
        }
        with open("asset_validation_report.json", "w") as f:
            json.dump(report, f, indent=2)

        return 1 if errors else 0

if __name__ == "__main__":
    project_root = Path(sys.argv[1]) if len(sys.argv) > 1 else Path(".")
    validator = AssetValidator(project_root=project_root)
    sys.exit(validator.run())

Two-Stage Validation: Automated + Human

Not everything can be automated. “Does this character model look right?” is a human question. The pattern:

Automated checks run first. Dimensions, file size, naming conventions. For meshes: file size limits (poly counts require engine-specific tools or libraries like trimesh). Instant, on every commit.
Human review runs second. Art direction, visual quality, animation feel. Only for assets that pass automated checks.

This is where commit tags from Part 1 come back. Use tags like #approval or #review in your commit messages to trigger human review workflows. A Perforce trigger (from Part 2) can parse these tags and route to the right reviewer.

The automated checks save human reviewers from wasting time on obviously broken assets. If a texture is 16K and non-power-of-two, there’s no point asking the art director to review it.

Platform Certification Pre-Checks

Console certification — Sony TRC, Microsoft XR, Nintendo Lotcheck — is the final boss of shipping a game. Fail cert, and you’re looking at weeks of delays. The key insight: treat cert requirements as test cases from day one.

Requirement Category	Sony (TRC)	Microsoft (XR)	Nintendo (Lotcheck)	Automatable?
Controller disconnect handling	✓	✓	✓	Yes
Save data corruption recovery	✓	✓	✓	Yes
Language/locale support	✓	✓	✓	Partially
Achievement/trophy integration	✓	✓	✓	Yes
Network disconnect handling	✓	✓	✓	Yes
Framerate requirements	✓	✓	✓	Yes
Age rating display	✓	✓	✓	Yes
Terms of service display	✓	✓	✓	Yes
User privilege checks	✓	✓	—	Yes
Suspend/resume handling	✓	✓	✓	Partially
Content during loading	—	✓	—	Partially

Most of these can be automated. Controller disconnect? Simulate it programmatically. Save corruption? Write garbage to a save file and verify the game recovers. These tests should run on every Shipping build.

stage('Cert Pre-Check') {
    when {
        expression { params.BUILD_TYPE == 'Shipping' }
    }
    steps {
        script {
            echo "Running platform certification pre-checks for ${params.PLATFORM}..."

            // Run cert-specific test suite
            sh """
                ${env.UE_ROOT}/Engine/Build/BatchFiles/RunUAT.sh RunTests \
                -project="\${WORKSPACE}/GameProject.uproject" \
                -testfilter="Project.Certification.${params.PLATFORM}" \
                -platform=${params.PLATFORM} \
                -configuration=Shipping \
                -reportexportpath="\${WORKSPACE}/TestResults/Cert" \
                -log
            """

            // Platform-specific checks
            if (params.PLATFORM == 'PS5') {
                sh """
                    # Validate TRC-specific requirements
                    # Set TRC_VERSION to match your current SDK requirements
                    python3 scripts/cert_check.py \
                        --platform ps5 \
                        --build-dir "\${WORKSPACE}/Builds/${params.PLATFORM}" \
                        --trc-version "\${TRC_VERSION}" \
                        --report "\${WORKSPACE}/TestResults/Cert/trc_report.json"
                """
            } else if (params.PLATFORM == 'XboxSeriesX') {
                sh """
                    # Validate XR-specific requirements
                    # Set XR_VERSION to match your current SDK requirements
                    python3 scripts/cert_check.py \
                        --platform xbox \
                        --build-dir "\${WORKSPACE}/Builds/${params.PLATFORM}" \
                        --xr-version "\${XR_VERSION}" \
                        --report "\${WORKSPACE}/TestResults/Cert/xr_report.json"
                """
            }
        }
    }
    post {
        always {
            junit 'TestResults/Cert/**/*.xml'
            archiveArtifacts artifacts: 'TestResults/Cert/**', allowEmptyArchive: true
        }
        failure {
            slackSend channel: '#cert-tracking',
                message: "⚠️ Cert pre-check FAILED for ${params.PLATFORM} Shipping build (CL ${params.CHANGELIST})"
        }
    }
}

This stage only runs for Shipping builds — matching the build types from Part 3. There’s no point running cert checks on Development builds.

Start early. Don’t wait until you’re two weeks from submission to discover your game doesn’t handle controller disconnect. Add cert test cases to your nightly suite the moment you have a playable build.

Visual Regression Testing

Screenshot comparison catches visual bugs that functional tests miss: broken shaders, wrong texture assignments, UI layout drift, lighting changes.

The approach: capture screenshots at defined camera positions, compare against known-good reference images using perceptual diff algorithms (SSIM or perceptual hash).

The honest take: visual regression testing is hard to do well. Non-deterministic rendering (TAA, particle systems, dynamic lighting) causes false positives. Different GPUs produce slightly different output. Most sub-AAA studios should prioritize the earlier sections of this post first.

If you do pursue it:

Unreal Engine has built-in screenshot comparison in its automation framework (FAutomationTestScreenshotCompare)
ImageMagick can compute perceptual diffs between screenshots (compare -metric SSIM)
Set generous thresholds (95% similarity rather than 99%) to avoid false positives
Only compare in deterministic scenarios: fixed camera, no particles, no TAA, static lighting
Use it for UI screens first (menus, HUD, inventory) — these are the most deterministic

Visual regression is a “nice to have.” The boot test, performance gates, and cert pre-checks are “need to have.”

How ButterStack Helps

ButterStack doesn’t replace your test framework — it gives you visibility into what’s happening across your entire build-test-deploy pipeline. Here’s how ButterStack fits into this picture:

AI Build Failure Analysis

When a build fails, ButterStack’s AI analyzes the log output and categorizes the failure:

compile_error — Code compilation failures
linker_error — Linking failures (missing symbols, library issues)
asset_cook_failure — Asset cooking/processing failures
shader_error — Shader compilation failures
test_failure — Automated test failures
timeout — Build exceeded time limits
oom — Out of memory during build
dependency_issue — Missing or incompatible dependencies
config_error — Configuration/setup issues

Each failure is rated by severity (critical, high, medium, low), attributed to the developer whose commit likely caused it, and matched against known patterns. Over time, this builds a knowledge base: “shader errors in this project are usually caused by missing permutation defines” becomes a suggestion the next time a similar error appears.

3-Step Build Pipeline

Every build in ButterStack flows through three stages:

External Build — Your Jenkins/CI build runs and reports status back
Asset tracking — Which assets were included, approval status, and cook-time data from engine logs
Approval — Human approval gate with passed_pending_approval status

A build can compile successfully but still sit in passed_pending_approval until a lead reviews the asset changes. This prevents untested or unreviewed content from reaching QA.

Asset Approval Workflows

Individual assets have their own approval lifecycle:

pending — Awaiting review
approved — Cleared by a reviewer
denied — Rejected with feedback
ignored — Explicitly skipped (e.g., placeholder assets)

Approval types include manual, automated, review, and quality_check. ButterStack tracks approval duration — how long assets sit in review — so you can identify bottlenecks in your content pipeline.

Asset Lineage

When QA finds a bug, the first question is “who changed this?” ButterStack traces from commit to asset to build. Assets are linked to the changelists and commits that modified them and the build runs that included them. When your task tracker is integrated, this chain extends back to the originating task.

This is the traceability chain from Part 1 made concrete: Task → Commit → Build → Asset → Approval.

Game Engine Build Observation

For Unreal and Unity builds, ButterStack tracks engine-specific metrics:

Per-asset cook tracking (which assets take longest to cook)
DDC (Derived Data Cache) hit rates (referencing the shared DDC from Part 3)
Shader permutation counts
Compression ratios (source vs. cooked size)
Build phase tracking: initializing → cooking → compiling shaders → packaging → archiving

This turns “the build took 3 hours” into “the build took 3 hours because shader compilation took 90 minutes and 47 assets had zero DDC hits.”

Coming soon: Asset optimization recommendations, lineage visualization, and real-time activity tracking are in development.

Get started with ButterStack →

FAQ

We have zero automated tests. Where do we start?

The boot test. Write one test that launches your game and waits for the main menu to appear. Run it on every build. This single test catches missing assets, broken initialization, shader compile failures, and crash-on-launch bugs. You can write it in an afternoon. Everything else builds on this foundation.

How do you test multiplayer/networked gameplay?

Automated multiplayer testing requires launching multiple game instances and coordinating between them. Unreal’s Gauntlet supports multi-process tests. Unity requires custom harnesses. Start with connection tests (can two clients connect and see each other?), then add gameplay tests (can they interact?). Dedicated server builds make this easier since you can run headless server instances on CI machines.

Our tests are flaky. They pass sometimes and fail other times.

Quarantine flaky tests — move them to a separate suite that runs but doesn’t fail the build. Never ignore them. Investigate and fix the root cause. Common culprits: timing-dependent assertions (use WaitForCondition instead of WaitForSeconds), shared state between tests, non-deterministic physics, and file system race conditions. A flaky test that you ignore will hide a real failure eventually.

Should QA write automated tests?

QA should define test cases. Whether they write the automation depends on their skillset. Some QA engineers write excellent automation. Others are better at exploratory testing and edge-case discovery. The best setup: QA defines what to test, engineers automate the repetitive cases, QA focuses their manual time on what can’t be automated — game feel, art review, edge cases.

How long should test suites take?

Suite	Budget	Rationale
Unit tests	< 5 min	Must run on every commit without blocking developers
Smoke/boot	< 10 min	Runs on every build; fast enough to not delay feedback
Integration	< 30 min	Nightly; can afford more time
Full suite	< 2 hours	Release candidates; thoroughness over speed

If your unit tests take more than 5 minutes, split them. If your full suite takes more than 2 hours, parallelize across multiple machines.

Is manual QA still needed?

Absolutely. Automated tests answer “does it work?” Manual QA answers “is it good?” Game feel, art direction, difficulty balance, user experience — these are human judgments. Automation frees QA from repetitive regression testing so they can focus on the subjective quality questions that only humans can answer. The best QA teams spend 80% of their time on exploration and 20% on regression.

How do we handle test data and save files?

Generate test data deterministically. Don’t rely on save files from someone’s dev machine — they’ll be out of date within a week. Write test fixtures that create known game states: a save at level 5 with specific inventory, a profile with specific settings. Store these as test assets in your repository. Regenerate them as part of your test setup.

What’s Next

Your builds now compile, get tested, pass performance gates, and survive cert pre-checks. But a tested build sitting on a Jenkins server isn’t shipping. In Part 5, we’ll cover deployment: Steam uploads, console submission workflows, staging environments, and rollback strategies. The last mile from “build approved” to “players can download it.”

Conclusion

The traceability chain from Part 1 was:

Task → Commit → Build → Test → Deploy

We’ve spent this post filling in the Test link. Without it, your chain is:

Task → Commit → Build → ??? → Hope for the best

Here’s what to do next, in order:

Write a boot test. Today. One test that launches your game and verifies the main menu appears. This alone is transformative.
Add it to Jenkins. Use the pipeline stages from this post. Run it on every build.
Define performance budgets. Write down your frame time, memory, and load time targets. Even if you don’t automate the checks yet, having the numbers written down changes how your team thinks about performance.
Automate asset validation. Start with texture sizes and naming conventions. Low effort, high payoff.
Add cert checks to Shipping builds. Don’t wait until submission month.

Every bug your QA team finds should become a test case. That crash in the tutorial? Write a test that plays through the tutorial. That performance regression from the uncompressed textures? Add a texture size check. Over time, your automated suite becomes a living record of every mistake you’ve caught — and a guarantee you’ll never ship that mistake again.

Start small. Build up. A single boot test is infinitely better than no tests at all.

Thanks!
Ryan L’Italien
Founder and CEO of ButterStack

Ryan L'Italien

Want to see what pipeline observability looks like? Try ButterStack free and connect your first integration in minutes.

Or just email me at: ryan@butterstack.com.