HTML to PDF Generation using Puppeteer: From Basics to Advanced

Introduction

Converting HTML to PDFs is essential for a wide variety of tasks such as generating invoices, digital receipts, or high-quality reports. With Puppeteer, a Node.js library that provides a robust API for headless Chrome/Chromium, we can produce consistently clean and professional PDFs that mirror the layout and styling of our web content. In this guide, I take you through the basics of Puppeteer’s HTML-to-PDF generation, walk through advanced customizations, and then scale it up with Docker, LocalStack, and AWS Lambdas (leveraging AWS SAM CLI) so it’s ready for prime time in a production environment. I’ll sprinkle in tips, best practices, and gotchas I’ve learned from building real-world PDF generation pipelines.

1. The Basics of Puppeteer PDF Generation

Puppeteer makes PDF generation simple. Under the hood, it launches a headless browser, renders your HTML content exactly as Chrome would, and exports the resulting layout as a PDF. This ensures layout fidelity, including CSS styling, responsive designs, and web fonts (if properly loaded).

Let’s jump into a minimal example. We provide Puppeteer with some inline HTML, tell it to render the page (allowing enough time for resources to load), and then produce a simple PDF:

Minimal HTML-to-PDF Example

import puppeteer from 'puppeteer';
import fs from 'fs/promises';

async function generatePdfFromHtml(html: string, outputPath: string) {
  // Launch a headless browser
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  // Set the page content
  await page.setContent(html, { waitUntil: 'networkidle0' });

  // Generate the PDF
  await page.pdf({ path: outputPath });

  await browser.close();
}

// Usage example
(async () => {
  const sampleHtml = `<html>
    <head>
      <title>Sample PDF</title>
    </head>
    <body>
      <h1>Hello, Puppeteer!</h1>
      <p>This is a PDF generated from a simple HTML.</p>
    </body>
  </html>`;

  await generatePdfFromHtml(sampleHtml, 'sample.pdf');
  console.log('PDF generated successfully!');
})();

This snippet is perfect for scenarios where you only need a straightforward PDF. But as soon as you want to add complex layout or incorporate real data, Puppeteer’s additional configuration options and automation features become indispensable.

2. Advanced Puppeteer PDF Options

Puppeteer’s page.pdf() method provides a wealth of options:

Advanced PDF Options

await page.pdf({
  path: 'custom.pdf',
  format: 'A4',
  printBackground: true,
  margin: {
    top: '1cm',
    right: '1cm',
    bottom: '1cm',
    left: '1cm',
  },
  pageRanges: '1-2', // specify pages to include
});

format: e.g., A4, Letter, Tabloid. You can also specify custom dimensions like "1920px x 1080px" if needed.
printBackground: Ensures that CSS backgrounds, colors, and images are included.
pageRanges: Control partial rendering for multi-page documents (e.g., first two pages only).
margin: Fine-tune margin sizes to suit your layout or comply with official print guidelines.

By mixing and matching these options, you can create a variety of specialized outputs, from smaller receipt-type prints to large-format pages with bleeds for more sophisticated design requirements.

3. Dockerizing Puppeteer

Anyone who’s tried running Puppeteer in production knows that missing system dependencies quickly become a stumbling block. To avoid “it works on my machine” issues, I wrap my Puppeteer-based projects in a Docker container that has all the necessary libraries and fonts. This ensures a consistent runtime environment wherever the container is deployed.

Dockerfile

# Dockerfile
FROM node:18-bullseye

# Install required dependencies for Chromium
RUN apt-get update && apt-get install -y \
    gconf-service \
    libasound2 \
    libatk1.0-0 \
    libatk-bridge2.0-0 \
    libc6 \
    libcairo2 \
    libcups2 \
    libdbus-1-3 \
    libexpat1 \
    libfontconfig1 \
    libgcc1 \
    libgdk-pixbuf2.0-0 \
    libglib2.0-0 \
    libgbm-dev \
    libgtk-3-0 \
    libx11-6 \
    libx11-xcb1 \
    libxcb1 \
    libxcomposite1 \
    libxcursor1 \
    libxdamage1 \
    libxext6 \
    libxfixes3 \
    libxi6 \
    libxrandr2 \
    libxrender1 \
    libxss1 \
    libxtst6 \
    ca-certificates \
    fonts-ipafont-gothic \
    fonts-wqy-zenhei \
    fonts-thai-tlwg \
    fonts-kacst \
    fonts-freefont-ttf \
    libappindicator1 \
    libnss3 \
    lsb-release \
    xdg-utils \
    wget \
    --no-install-recommends \
  && rm -rf /var/lib/apt/lists/*

WORKDIR /app

COPY package*.json ./
RUN npm install

COPY . .

# Expose a port for our service
EXPOSE 3000

CMD ["npm", "run", "start"]

Now, whenever I run docker build and docker run, my Puppeteer environment is guaranteed to have everything it needs—no more scrambling for missing fonts or GPU library errors.

4. Running Locally with Docker Compose and LocalStack

If you want to store your PDFs in S3 or need other AWS services, you’ll want to test your application locally without incessant round-trips to the AWS cloud. LocalStack is the perfect solution—it emulates AWS services, including S3, on your development machine.

docker-compose.yml

version: '3.8'
services:
  pdf-service:
    build: .
    container_name: pdf-service
    ports:
      - "3000:3000"
    environment:
      - AWS_REGION=us-east-1
      - AWS_ACCESS_KEY_ID=test
      - AWS_SECRET_ACCESS_KEY=test

  localstack:
    image: localstack/localstack
    container_name: localstack
    ports:
      - "4566:4566"
      - "4571:4571"
    environment:
      - SERVICES=s3
      - DEBUG=1
      - DATA_DIR=/tmp/localstack/data
      - AWS_DEFAULT_REGION=us-east-1
    volumes:
      - "./.localstack:/tmp/localstack"

After a quick docker-compose up, you’ll have both your PDF service and LocalStack running in tandem. You can now point your AWS SDK to LocalStack’s endpoints, treat your local environment as if it was AWS, and store PDFs in a “fake” S3 bucket during development.

5. Saving PDFs to S3 (Emulated with LocalStack)

To illustrate how we might store generated PDFs in an S3 bucket, here’s a short snippet using the AWS SDK v3 for JavaScript. Make sure you’ve installed @aws-sdk/client-s3 before running the code:

Generate PDF and Upload to S3

import puppeteer from 'puppeteer';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { Readable } from 'stream';

const s3Client = new S3Client({
  region: process.env.AWS_REGION || 'us-east-1',
  endpoint: 'http://localstack:4566',
  forcePathStyle: true, // needed for localstack
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID || 'test',
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY || 'test',
  },
});

async function generatePdfAndUpload(html: string, bucketName: string, key: string) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent(html, { waitUntil: 'networkidle0' });
  
  // Instead of writing to a path, we generate a buffer
  const pdfBuffer = await page.pdf();

  await browser.close();

  // Upload to S3
  const pdfStream = Readable.from(pdfBuffer);
  const putParams = {
    Bucket: bucketName,
    Key: key,
    Body: pdfStream,
    ContentType: 'application/pdf'
  };

  await s3Client.send(new PutObjectCommand(putParams));
  console.log(`PDF uploaded to S3 as ${key}`);
}

export { generatePdfAndUpload };

It’s often more flexible to generate a PDF as a buffer in-memory rather than writing it to the local file system, especially when working with modern cloud services or microservices patterns. This snippet seamlessly streams the PDF to “S3” (LocalStack, in our case) and keeps your Docker container’s file system usage minimal.

6. Deploying as an AWS Lambda with AWS SAM CLI

Building a local Docker-based environment is great, but eventually, you might want to deploy your PDF generation service to AWS. By combining Puppeteer with AWS Lambda, you can build a highly scalable PDF generation function that only costs money when it’s used—a perfect pay-per-request model. AWS SAM (Serverless Application Model) CLI can simplify both packaging and deploying your Lambda.

template.yaml

Transform: AWS::Serverless-2016-10-31
Description: PDF generation service
Resources:
  PdfLambda:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs18.x
      CodeUri: ./dist
      Timeout: 30
      Policies:
        - S3FullAccess # or more restrictive policies
      Environment:
        Variables:
          AWS_NODEJS_CONNECTION_REUSE_ENABLED: "1"
          BUCKET_NAME: "my-pdf-bucket"
Outputs:
  PdfLambdaFunction:
    Description: "PDF Lambda Function ARN"
    Value: !GetAtt PdfLambda.Arn

And here’s a minimal index.ts file that picks up the incoming payload, generates a PDF, and stores it in S3:

index.ts (Lambda Entry)

// index.ts - Lambda entry point
import { APIGatewayEvent, Context } from 'aws-lambda';
import { generatePdfAndUpload } from './pdfService';  // your Puppeteer logic

export const handler = async (event: APIGatewayEvent, context: Context) => {
  const bucketName = process.env.BUCKET_NAME || '';
  const html = event.body || '<h1>Hello from Lambda</h1>';

  // Derive key from event or time
  const key = `test-${Date.now()}.pdf`;

  await generatePdfAndUpload(html, bucketName, key);

  return {
    statusCode: 200,
    body: JSON.stringify({ message: 'PDF generated and uploaded!', key }),
  };
};

If your resulting deployment package is too large (Puppeteer can be hefty!), consider using a dedicated Puppeteer “layer” or rely on minimal Chromium builds that reduce the overall size. AWS Lambda Layers let you share common dependencies across multiple functions and reduce your per-function deploy size.

7. Handling Dynamic Data & Templating

Often, you won’t just be dumping static HTML into Puppeteer. You’ll need to inject dynamic data—like user info, purchase histories, or real-time analytics—into your document. For this scenario, templating engines are your friend.

You might choose libraries like ejs, handlebars, or pug. For example, with Handlebars, you can separate your presentation (HTML layout) from your logic, making the code more maintainable:

Install Handlebars

npm install handlebars

Using Handlebars

import Handlebars from 'handlebars';
import fs from 'fs/promises';

async function generateDynamicPdf() {
  // Load an HTML template from disk
  const template = await fs.readFile('./invoiceTemplate.html', 'utf-8');
  const compileTemplate = Handlebars.compile(template);

  // Data to be inserted in the template
  const data = {
    customerName: 'Jane Doe',
    items: [
      { description: 'Laptop', price: 1599 },
      { description: 'Monitor', price: 299 },
    ],
    total: 1898
  };

  const htmlWithData = compileTemplate(data);
  // Now pass htmlWithData to Puppeteer as before
}

This approach ensures that your PDFs can adapt to a wide range of data inputs while keeping your templates organized and straightforward to modify.

8. Troubleshooting Common Issues and Performance Tips

Anytime I’ve used Puppeteer for PDF generation, I’ve come across a few repeat issues or performance pitfalls. Here’s a quick overview:

Fonts Not Rendering Properly: Ensure that the images and fonts you’re using are available locally or at accessible URLs. In Docker, remember to install any additional fonts you need.
CSS Media Queries: Some print-based CSS rules rely on @media print. Puppeteer uses print styles by default for PDFs. Double-check that your styling is correct for printed media.
Large PDFs or Many Concurrency Requests: Be mindful of memory constraints. Puppeteer instances can be memory-heavy. You can pool or reuse browsers instead of launching a new one for every request.
Timeouts with Complex Pages: Some pages have dynamic content or heavy scripts that take time to load. Use waitUntil and possibly increase waiting thresholds if necessary.

These practices can help resolve common headaches and keep your PDF service stable, even when the real world doesn’t always match a controlled workshop environment.

9. A Full Example: LocalStack-Hosted PDF Service with S3 Storage

Let’s tie together the Docker + Puppeteer + LocalStack trifecta in a single sample application. Our final service will:

Expose an API endpoint /pdf to receive HTML as POST data.
Use Puppeteer to generate PDFs from that HTML content.
Upload the PDF to an S3 bucket, which is emulated by LocalStack.
Respond with a success message and the object key.

Below is a rough directory structure and relevant code snippets to get you running:

Project Structure

my-pdf-service/
├── Dockerfile
├── docker-compose.yml
├── src/
│   ├── index.ts
│   ├── pdfService.ts
│   └── server.ts
├── package.json
└── tsconfig.json

server.ts: A simple Express server to accept HTML via POST, forward it along to our Puppeteer logic, and return a JSON response.

server.ts

import express from 'express';
import bodyParser from 'body-parser';
import { generatePdfAndUpload } from './pdfService';

const app = express();
app.use(bodyParser.text({ type: '*/*' }));

app.post('/pdf', async (req, res) => {
  try {
    const html = req.body;
    if (!html) {
      return res.status(400).send({ error: 'No HTML provided' });
    }
    const bucket = 'my-local-bucket';
    const key = `pdf-${Date.now()}.pdf`;
    await generatePdfAndUpload(html, bucket, key);
    res.status(200).send({ message: 'PDF generated and uploaded!', key });
  } catch (error) {
    console.error(error);
    res.status(500).send({ error: 'Something went wrong!' });
  }
});

const port = process.env.PORT || 3000;
app.listen(port, () => {
  console.log(`PDF service listening on port ${port}`);
});

pdfService.ts: Our Puppeteer logic plus integration with LocalStack’s S3 endpoint.

pdfService.ts

import puppeteer from 'puppeteer';
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';
import { Readable } from 'stream';

const s3Client = new S3Client({
  region: 'us-east-1',
  endpoint: 'http://localstack:4566',
  forcePathStyle: true,
  credentials: {
    accessKeyId: 'test',
    secretAccessKey: 'test',
  },
});

export async function generatePdfAndUpload(html: string, bucketName: string, key: string) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent(html, { waitUntil: 'networkidle0' });
  
  const pdfBuffer = await page.pdf({ format: 'A4', printBackground: true });
  await browser.close();

  const pdfStream = Readable.from(pdfBuffer);
  const putParams = {
    Bucket: bucketName,
    Key: key,
    Body: pdfStream,
    ContentType: 'application/pdf',
  };

  await s3Client.send(new PutObjectCommand(putParams));
}

With these files in place, along with the earlier docker-compose.yml and Dockerfile, you’re ready to run docker-compose up. Once everything starts, you can test the service using a simple cURL command:

cURL Request

curl -X POST \
  -H "Content-Type: text/plain" \
  --data "<h1>Hello Container!</h1>" \
  http://localhost:3000/pdf

That’s it! You’ll get a JSON response telling you the PDF has been successfully stored in your LocalStack S3 bucket. This system closely mirrors a live cloud environment but keeps your development loops pleasantly fast and offline.

10. Conclusion

We’ve taken quite the journey—starting with a simple HTML-to-PDF approach using Puppeteer, then scaling up to containerization, local AWS emulation via LocalStack, and even serverless deployment with AWS Lambda. This robust workflow allows you to develop, test, and deploy PDF generation pipelines quickly, ensuring consistent results from local environments all the way to production.

By layering Docker, LocalStack, and AWS SAM on top of Puppeteer’s PDF generation features, you’re equipped to tackle everything from simple invoice creation to complex, on-the-fly, data-driven reports. I hope this tutorial has provided a clear and enjoyable path to building your own professional-grade PDF generation service.

Whether you need dynamic templating, specialized fonts, or advanced print layouts, you can fine-tune your setup to match your unique requirements. Puppeteer’s flexibility combined with microservices, containers, and the serverless paradigm sets you up for success in just about any environment.