How to set up a Playwright testing framework that doesn't fall apart at scale?

Written by: Chief Operating Officer

Posted: 23.06.2025

15 min read

Most test automation projects break down in year two, not year one. At 50 tests, the setup looks manageable. By the time the suite hits 300, auth logic is buried in every beforeEach, environment configs have drifted across three separate files, and CI takes 40 minutes without a clear reason why. The framework wasn't designed badly. It was built reactively, one test at a time, without architectural intent from the start.

This guide covers how to set up a Playwright testing framework the way senior engineers do it: starting with the decisions that determine long-term maintainability, then building each layer — config, structure, POM, fixtures, and CI — so nothing needs to be rebuilt at 200 tests.

Before you run npm init — decisions that lock in your architecture

The npm init playwright@latest command takes 30 seconds. The decisions you make before running it shape whether your framework survives to 300 tests.

Three choices deserve explicit answers before you write a single line of code.

TypeScript or JavaScript? TypeScript, always, for any team with more than one contributor maintaining tests over time. The reason isn't aesthetics. Type safety in fixture composition catches mismatches between fixture definitions and their consumers at compile time, not at 3am when CI fails. IDE support for page object refactoring means renaming a method in BasePage.ts propagates automatically instead of breaking 40 test files silently. For mid-market SaaS teams specifically, the maintenance overhead of untyped fixture chains compounds fast.

Standalone repo or monorepo? The realistic options are a dedicated qa-automation repo or a tests/ directory inside the main application repo. A standalone repo gives clearer ownership, simpler CI wiring, and prevents test engineers from accidentally importing application internals. A tests/ directory inside the app repo makes environment variables accessible without extra configuration and tightens the feedback loop when UI and test code change together. The trade-off is proximity to source versus separation of concerns. For teams where QA and development run on different release cycles, the standalone repo wins. For teams where tests and features ship together, co-location reduces friction.

One playwright.config.ts or many? One, always. Separate config files per environment (staging.config.ts, prod.config.ts) seem like a clean abstraction until they diverge. Someone updates the timeout in staging's config and forgets production. A new projects entry gets added to one file and not the other. A single environment-aware config driven by process.env.TEST_ENV gives you one source of truth and makes every environment switch a one-variable change. The rest of this article builds on that assumption.

If you've just taken over a repository where these decisions were never made explicitly, you'll find the consequences scattered across the codebase: magic strings where base URLs should be, beforeEach blocks handling auth that should live in a fixture, and a CI config that runs playwright test with no environment context. That's the shape of a framework assembled without a plan. Everything that follows is what a plan looks like.

Starting a new Playwright project and want the architecture right before the first test is written? DeviQA's automation engineers set up production-ready frameworks as part of our QA service engagements.

Book a strategic QA consultation

Project structure: what goes where and why

Every framework tutorial shows a folder tree. Few explain what breaks when you ignore it.

Playwright framework readiness checklist showing seven criteria for production-ready test automation setup

Here's the structure that holds up past 300 tests:

playwright-framework/
├── tests/               # Spec files only — no logic, no data setup
├── pages/               # Page Object Model classes, one per route or feature area
├── fixtures/            # Custom fixture definitions: auth, API seeding, page composition
├── utils/               # Pure functions: data generators, date helpers, API clients
├── config/              # Environment maps, base URL resolution, feature flag toggles
├── auth/                # Saved auth state (gitignored — never commit credentials)
├── .env.staging         # Environment-specific secrets — never committed
├── playwright.config.ts # Single source of truth for all environments
└── package.json

The separation of concerns principle here is specific: spec files should read like test descriptions, not setup scripts. If a test file imports an API client directly, creates its own data, and handles its own auth, it's doing four jobs at once. That's the pattern you'll find in a suite with auth scattered across every beforeEach — and it's also why adding a new authenticated role to that suite means touching 40 files instead of one fixture.

The anti-patterns that break frameworks at scale follow directly from ignoring this structure. Test data hardcoded in spec files makes it impossible to run the same test against staging and production without manual edits. No fixtures/ directory means auth logic lives in beforeEach hooks that get copied, drifted, and eventually contradict each other. Mixing utils/ with pages/ means a refactoring task that should touch five files touches twenty.

The folder tree is not bureaucracy. It's the scaffolding that makes the fixture composition section below possible.

playwright.config.ts: the right way to wire it for multiple environments

This is where most frameworks make their first significant mistake: a single hardcoded base URL, no environment switching, and retry settings that are the same locally and in CI.

Here's a production-grade config that handles all three environments without duplication:

// playwright.config.ts

import { defineConfig, devices } from '@playwright/test';
import * as dotenv from 'dotenv';

const ENV = process.env.TEST_ENV ?? 'staging';

dotenv.config({ path: `.env.${ENV}` });

const BASE_URLS: Record<string, string> = {
  dev: 'http://localhost:3000',
  staging: 'https://staging.yourapp.com',
  production: 'https://yourapp.com',
};

export default defineConfig({
  testDir: './tests',
  fullyParallel: true,

  // Prevent focused tests from being accidentally committed to CI
  forbidOnly: !!process.env.CI,

  // Retries in CI catch genuine flakiness. Locally, they mask real failures.
  retries: process.env.CI ? 2 : 0,

  // workers: 1 in CI for stability in non-sharded runs.
  // See the CI section for when to flip to sharding instead.
  workers: process.env.CI ? 1 : undefined,

  // blob reporter is mandatory for shard merging — do not use html in CI
  reporter: process.env.CI ? 'blob' : 'html',

  // Global setup runs once before all tests to generate auth/user.json
  globalSetup: './global-setup',

  use: {
    baseURL: BASE_URLS[ENV] ?? BASE_URLS['staging'],

    // Default auth state for all tests — overridable per test via fixture
    storageState: 'auth/user.json',

    // Capture trace on first retry so you have Trace Viewer data for failures
    trace: 'on-first-retry',

    screenshot: 'only-on-failure',
  },

  projects: [
    {
      // Primary CI target
      name: 'chromium',
      use: { ...devices['Desktop Chrome'] },
    },
    {
      // Regression suite
      name: 'firefox',
      use: { ...devices['Desktop Firefox'] },
    },
    {
      // Optional — add when WebKit coverage is a product requirement
      name: 'webkit',
      use: { ...devices['Desktop Safari'] },
    },
  ],
});

A few decisions worth naming explicitly. Setting retries: 0 locally matters: if a test is flaky in your local environment, you want it to fail, not quietly pass on the second attempt. A flaky rate above 2% signals that CI trust is already eroding — retries in local dev make that threshold invisible until the problem is much larger.

The reporter: 'blob' setting in CI is not optional if you're using sharding. Blob reporter outputs a binary format that the merge-reports command combines across shards. Switch to html in CI and your merge step will fail. Allure is a reasonable alternative reporter for teams that want richer dashboards — but it requires a separate merge pipeline and is worth adding only after the base setup is stable.

Switch environments by changing a single variable: TEST_ENV=production npx playwright test. Nothing else changes.

If your team is already maintaining separate config files per environment, that's the first thing worth fixing before the suite grows further.

Talk to a Playwright engineer

Page Object Model: what it actually looks like in TypeScript

The standard POM tutorial shows a LoginPage with a username field, a password field, and a submit button. That tells you the concept, not the pattern.

Real page objects in a SaaS application share behavior: every page needs to wait for network activity, assert toast notifications, and handle navigation. Without a shared base class, those methods get copied into every page object, diverge slightly over time, and require the same fix applied in twelve places when the toast component changes.

Start with a BasePage that every other page class inherits from:

// pages/BasePage.ts

import { Page, expect, Locator } from '@playwright/test';

export class BasePage {
  constructor(protected page: Page) {}

  async navigate(path: string): Promise<void> {
    await this.page.goto(path);
    await this.page.waitForLoadState('networkidle');
  }

  async assertToastMessage(expected: string): Promise<void> {
    const toast: Locator = this.page.getByRole('alert');
    await expect(toast).toContainText(expected);
  }

  async waitForNetworkIdle(): Promise<void> {
    await this.page.waitForLoadState('networkidle');
  }
}

Then extend it for each feature area:

// pages/DashboardPage.ts

import { Page, expect, Locator } from '@playwright/test';
import { BasePage } from './BasePage';

export class DashboardPage extends BasePage {
  private readonly welcomeBanner: Locator;
  private readonly projectList: Locator;

  constructor(page: Page) {
    super(page);
    this.welcomeBanner = this.page.getByTestId('welcome-banner');
    this.projectList = this.page.getByRole('list', { name: 'projects' });
  }

  async assertWelcomeMessage(name: string): Promise<void> {
    await expect(this.welcomeBanner).toContainText(name);
  }

  async expectProjectCount(count: number): Promise<void> {
    await expect(this.projectList.getByRole('listitem')).toHaveCount(count);
  }
}

Note what's not here: no goto() calls inside spec files, no locators defined inline in tests, no page.locator('.dashboard-list > li') scattered across a dozen test files. When the projects list component gets refactored, you update DashboardPage.ts and nothing else breaks.

One note on when not to use POM: simple smoke tests or single-assertion health checks don't benefit from the abstraction overhead. If a test file will never exceed three assertions and has no shared state with other tests, writing a page object for it creates more maintenance than it prevents. Use the pattern where it earns its cost.

Fixtures: the layer most teams skip and then rebuild

If there's one architectural decision that separates a tutorial-level framework from a production one, it's fixture composition. Teams that skip fixtures end up with auth logic copy-pasted across beforeEach blocks, test data leaking between tests, and spec files that read like setup scripts rather than test descriptions.

Three fixture patterns belong in any SaaS application framework from day one.

The storageState auth fixture

The goal is to log in once, save the session to auth/user.json, and inject that authenticated state into every test that needs it without a single beforeEach. The mechanism is a global setup script combined with a fixture that reads the saved state:

// global-setup.ts

import { chromium } from '@playwright/test';

async function globalSetup(): Promise<void> {
  const browser = await chromium.launch();
  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto(process.env.BASE_URL + '/login');
  await page.getByLabel('Email').fill(process.env.TEST_USER_EMAIL!);
  await page.getByLabel('Password').fill(process.env.TEST_USER_PASSWORD!);
  await page.getByRole('button', { name: 'Sign in' }).click();
  await page.waitForURL('**/dashboard');

  // Save session cookies and localStorage to disk
  await context.storageState({ path: 'auth/user.json' });

  await browser.close();
}

export default globalSetup;

The playwright.config.ts line globalSetup: './global-setup' runs this once before the test suite starts. The use: { storageState: 'auth/user.json' } line in the config applies that saved state to every test by default.

If you've inherited a suite where 80 tests each call page.goto('/login') and fill credentials inside beforeEach, replacing all of that with this pattern is one of the highest-leverage refactors available.

The API seeding fixture

Flaky tests are often a state problem, not a timing problem. When tests share data that one test creates and another test expects to find, execution order matters and parallelism breaks things. The fix is test-scoped data that gets created before the test and destroyed after it:

// fixtures/seed.fixture.ts

import { test as base } from '@playwright/test';

type SeedFixtures = {
  seededUser: { id: string; email: string };
};

export const test = base.extend<SeedFixtures>({
  seededUser: async ({ request }, use) => {
    // Create isolated test data via internal API
    const response = await request.post('/api/test/users', {
      data: { email: `seed-${Date.now()}@example.com`, role: 'member' },
    });

    const user = await response.json();

    // Hand the data to the test
    await use(user);

    // Teardown runs automatically after the test completes
    await request.delete(`/api/test/users/${user.id}`);
  },
});

The use() call is the boundary: everything before it is setup, everything after it is teardown. Playwright guarantees the teardown runs even when the test fails. This is the pattern that eliminates the category of failures caused by dirty state from a previous test run.

Composing fixtures into a single import

This is where the architecture pays off. Rather than importing authFixture and seedFixture and DashboardPage separately in each spec file, you compose them once:

// fixtures/index.ts

import { test as base, expect } from '@playwright/test';
import { DashboardPage } from '../pages/DashboardPage';

type MyFixtures = {
  authenticatedPage: import('@playwright/test').Page;
  seededUser: { id: string; email: string };
  dashboardPage: DashboardPage;
};

export const test = base.extend<MyFixtures>({
  authenticatedPage: async ({ browser }, use) => {
    const context = await browser.newContext({ storageState: 'auth/user.json' });
    const page = await context.newPage();
    await use(page);
    await context.close();
  },

  seededUser: async ({ request }, use) => {
    const res = await request.post('/api/test/users', {
      data: { email: `seed-${Date.now()}@example.com` },
    });
    const user = await res.json();
    await use(user);
    await request.delete(`/api/test/users/${user.id}`);
  },

  // dashboardPage depends on authenticatedPage — Playwright resolves the dependency graph
  dashboardPage: async ({ authenticatedPage }, use) => {
    await use(new DashboardPage(authenticatedPage));
  },
});

export { expect };

Every spec file then imports just two things:

import { test, expect } from '../fixtures';

test('dashboard shows correct project count', async ({ dashboardPage, seededUser }) => {
  await dashboardPage.navigate('/dashboard');
  await dashboardPage.assertWelcomeMessage(seededUser.email);
});

The spec file declares what it needs. The fixture layer handles everything else.

This is the pattern that separates a framework you can hand to five engineers from a framework only one person understands. Spec files should not orchestrate setup.

Wiring up GitHub Actions: treat CI as day-one infrastructure

A framework that isn't CI-ready from the first commit will be retrofitted later. Retrofitting CI means touching every configuration assumption made during development and finding out which ones break at scale. Start with a production-grade CI setup before you have enough tests to need it.

The minimal working workflow: npm ci, npx playwright install --with-deps, npx playwright test, artifact upload. That covers the first 4 weeks.

The production version adds sharding. Browser binary downloads run 200 to 400MB depending on the browser; caching them against the Playwright version hash cuts 60 to 90 seconds from every CI run. GitHub Actions Ubuntu runners have 2 CPU cores, which matters when you're deciding how many workers to run per shard.

# .github/workflows/playwright.yml

name: Playwright Tests

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    name: "Tests — shard ${{ matrix.shardIndex }}/${{ matrix.shardTotal }}"
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        shardIndex: [1, 2, 3, 4]
        shardTotal: [4]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Cache Playwright browsers
        uses: actions/cache@v4
        id: playwright-cache
        with:
          path: ~/.cache/ms-playwright
          key: playwright-${{ hashFiles('**/package-lock.json') }}
      - name: Install dependencies
        run: npm ci
      - name: Install Playwright browsers
        if: steps.playwright-cache.outputs.cache-hit != 'true'
        run: npx playwright install --with-deps
      - name: Install system dependencies only
        if: steps.playwright-cache.outputs.cache-hit == 'true'
        run: npx playwright install-deps
      - name: Run tests
        run: npx playwright test --shard=${{ matrix.shardIndex }}/${{ matrix.shardTotal }}
        env:
          TEST_ENV: staging
          TEST_USER_EMAIL: ${{ secrets.TEST_USER_EMAIL }}
          TEST_USER_PASSWORD: ${{ secrets.TEST_USER_PASSWORD }}
      - name: Upload blob report
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: blob-report-${{ matrix.shardIndex }}
          path: blob-report
          retention-days: 1

  merge-reports:
    name: Merge reports
    if: always()
    needs: [test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 20
      - name: Install dependencies
        run: npm ci
      - name: Download blob reports
        uses: actions/download-artifact@v4
        with:
          path: all-blob-reports
          pattern: blob-report-*
          merge-multiple: true
      - name: Merge into HTML report
        run: npx playwright merge-reports --reporter html ./all-blob-reports
      - name: Upload HTML report
        uses: actions/upload-artifact@v4
        with:
          name: html-report
          path: playwright-report
          retention-days: 14

The workers: 1 setting in playwright.config.ts applies to non-sharded runs for stability. When you switch to sharding, each shard runs its portion of the test suite and can use multiple workers internally. The practical rule: keep workers: 1 in CI until your serial run time exceeds 10 minutes, then introduce sharding on 4 shards. The overhead of spinning up four GitHub Actions runners is not worth it below that threshold.

The blob reporter is non-negotiable in the sharded setup. Each shard writes a binary blob to blob-report/. The merge-reports job collects all four blobs and produces a single HTML report. Switch any shard to html reporter and the merge step breaks silently.

One hard requirement that belongs in your CI environment and not in the config: forbidOnly: true. This prevents a test.only() call from accidentally making it into the main branch and suppressing the entire test suite in CI. It has happened to every team that didn't enforce it.

Framework readiness checklist: how to know you're done

A framework is ready to hand off to a team when it passes all seven of these conditions — not six, not "most of them":

npx playwright test runs clean on a fresh clone with no manual steps beyond npm ci and setting env variables documented in a .env.example file.

Tests pass against both staging and production by changing a single environment variable: TEST_ENV=production.

Auth state is handled by the global setup and storageState fixture — no beforeEach block in any spec file performs a login.

CI runs in under 10 minutes with sharding across 4 workers, or in under 10 minutes serially for suites under 100 tests.

A failing test produces a Trace Viewer artifact automatically via trace: 'on-first-retry' — engineers can diagnose failures without rerunning locally.

forbidOnly: true and retries: 2 are active in the CI environment.

The pages/, fixtures/, and utils/ directories each contain at least one real, populated class — not empty placeholder files that signal the structure was created but not followed.

Point seven is worth naming explicitly. The most common failure mode after a framework setup article is teams who create the folder structure, then proceed to write every test in tests/ without ever populating fixtures/ or pages/. The structure is only useful if the conventions are followed. A readiness review before a team starts writing tests in earnest is worth an hour of your time.

The framework is the foundation, not the feature

Frameworks assembled reactively — test by test, problem by problem — accumulate decisions that contradict each other. The config has three different base URL strings. Auth lives in beforeEach and also in a fixture someone added in month three. CI was added last and runs serially because nobody had time to wire sharding properly.

The cost of that approach isn't visible at 80 tests. By 300, it owns your maintenance schedule.

The architectural decisions covered here — TypeScript from the start, a single environment-aware config, fixture-driven auth and data seeding, CI as day-one infrastructure — each cost an hour at setup time. Together, they determine whether your framework is something your team trusts and extends, or something they work around.

If you're inheriting a framework that doesn't match any of this — or starting from scratch and want the architecture right the first time — DeviQA's test automation engineers build production-ready Playwright setups as part of our QA service engagements.

Book a strategic QA consultation

About the author

Anastasiia Sokolinska

Chief Operating Officer

Anastasiia Sokolinska is the Chief Operating Officer at DeviQA, responsible for operational strategy, delivery performance, and scaling QA services for complex software products.