Most Go concurrency tutorials show you how to spawn goroutines and pass data through channels. Then you try to build something real and immediately hit problems the tutorials never mentioned: deadlocks, backpressure, graceful shutdowns that aren't actually graceful.

I built Drop, a price tracking service that scrapes hundreds of product URLs daily and notifies users about price drops. Here's what I learned about Go concurrency patterns that actually matter in production.

The Architecture: Scheduler + Worker Pool + Scraper

Drop's core is simple: check product prices periodically, update the database, notify users when prices drop below their targets.

The naive approach would be to loop through items and scrape them one by one. For 1,000 items checking every hour, that's over 16 minutes of sequential scraping. Unacceptable.

Instead, I built a worker pool that processes items concurrently while respecting resource constraints.

Pattern 1: Buffered Channels for Producer-Consumer Decoupling

The scheduler needs to distribute work to multiple workers. Here's the critical decision:

// Bad: Unbuffered channel
jobs := make(chan ItemJob)

// Good: Buffered to item count
jobs := make(chan ItemJob, len(items))

Why buffer to len(items)?

With an unbuffered channel, every send blocks until a worker receives. This creates tight coupling between producer and consumer speeds. If workers aren't ready yet, or you miscalculate worker count, you get deadlocks.

Buffered channels decouple this completely. The producer enqueues all jobs immediately without waiting for workers:

func (s *PriceRefresherScheduler) refreshAllPrices() {
    ctx := context.Background()
    items, err := s.itemsService.GetItemsDueForCheck(ctx)
    if err != nil {
        log.Printf("Error while refreshing prices: %s", err.Error())
        return
    }

    if len(items) == 0 {
        log.Printf("No items due for price refresh")
        return
    }

    log.Printf("Starting concurrent refresh of %d items with %d workers",
        len(items), s.workerCount)

    // Create channels for work distribution
    jobs := make(chan ItemJob, len(items))
    results := make(chan string, len(items))

    // Start worker pool
    for w := 1; w <= s.workerCount; w++ {
        go s.priceRefreshWorker(w, jobs, results)
    }

    // Producer fills the queue
    for _, item := range items {
        jobs <- ItemJob{
            ID:     item.ID,
            UserID: item.UserID,
            URL:    item.URL,
            Name:   item.Name,
        }
    }
    close(jobs) // Signal: no more work coming

    // Collect results...
}

This gives you:

No producer blocking - fill the queue instantly
Workers can start anytime - no timing assumptions
Clean shutdown - close the channel when done

The buffer size matters. Too small and you're back to blocking. Too large and you waste memory. Buffering to exactly len(items) is perfect for bounded work batches.

Pattern 2: Worker Pool with Independent Goroutines

Each worker is dead simple - no shared state, just pure functions:

// priceRefreshWorker processes individual refresh jobs
// Each worker runs independently with no shared state
func (s *PriceRefresherScheduler) priceRefreshWorker(
    workerID int,
    jobs <-chan ItemJob,
    results chan<- string,
) {
    for job := range jobs {
        log.Printf("Worker %d processing item: %s (ID: %s)",
            workerID, job.Name, job.ID)

        err := s.itemsService.RefreshPrice(
            context.Background(),
            job.ID,
            job.UserID,
            job.URL,
        )

        if err != nil {
            results <- fmt.Sprintf("FAILED: %s (%s): %v",
                job.ID, job.Name, err)
        } else {
            results <- fmt.Sprintf("SUCCESS: %s", job.Name)
        }
    }
}

The for job := range jobs pattern is crucial. It:

Automatically handles channel closing (loop exits when channel closes)
Processes jobs until queue is empty
Requires zero synchronization primitives

Workers are completely independent. No mutexes, no wait groups in the worker itself, no coordination needed.

Pattern 3: Results Collection with Blocking Receive

After dispatching jobs, we need to wait for all results:

successCount := 0
failCount := 0

for range items {
    result := <-results
    if strings.HasPrefix(result, "SUCCESS:") {
        successCount++
        log.Println(result)
    } else {
        failCount++
        log.Println(result)
    }
}

log.Printf("Price refresh complete: %d succeeded, %d failed out of %d total",
    successCount, failCount, len(items))

This blocks until exactly len(items) results come back. No busy waiting, no sleep loops - just synchronous collection of async work.

The results channel is also buffered to len(items), preventing workers from blocking when sending results. Workers finish faster, resources are released sooner.

Pattern 4: Timeout Context for Individual Operations

Web scraping has an enemy: hanging requests. A single stuck HTTP call can block a worker indefinitely.

In the service layer, I wrap each scrape with a timeout context:

func (s *service) CreateItem(ctx context.Context, userID string,
    req CreateItemRequest) (*ItemResponse, error) {

    if err := utils.ValidateURL(req.URL); err != nil {
        return nil, fmt.Errorf("invalid URL: %w", err)
    }

    if err := s.checkForDuplicates(ctx, userID, req.URL); err != nil {
        return nil, fmt.Errorf("duplicate item: %w", err)
    }

    // Create a timeout context for scraping to prevent hanging
    scrapeCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
    defer cancel()

    // Create channels to receive the scrape result or timeout
    resultChan := make(chan *scraper.PriceInfo, 1)
    errorChan := make(chan error, 1)

    // Run scraping in a separate goroutine
    go func() {
        pi, err := s.scraper.ScrapePrice(req.URL)
        if err != nil {
            errorChan <- err
        } else {
            resultChan <- pi
        }
    }()

    // Wait for result or timeout
    var priceInfo *scraper.PriceInfo
    var scrapeErr error

    select {
    case pi := <-resultChan:
        priceInfo = pi
    case err := <-errorChan:
        scrapeErr = err
    case <-scrapeCtx.Done():
        return nil, fmt.Errorf("price scraping timed out after 15 seconds")
    }

    // Handle the result
    if scrapeErr != nil {
        if strings.Contains(scrapeErr.Error(), "out of stock") {
            currentPrice := 0.0
            if req.TargetPrice != nil {
                currentPrice = *req.TargetPrice
            }
            inStock := false
            req.CurrentPrice = currentPrice
            req.InStock = &inStock
        } else {
            return nil, fmt.Errorf("failed to scrape price: %w", scrapeErr)
        }
    } else if priceInfo != nil {
        req.CurrentPrice = priceInfo.Price
        req.InStock = &priceInfo.InStock
    }

    // Create item in database...
}

Why not just rely on HTTP client timeout?

The HTTP client timeout only covers the request/response cycle. It doesn't account for:

HTML parsing time (goquery can be slow on massive pages)
Price extraction logic
Any other processing in the scraping function

The context timeout covers the entire operation. After 15 seconds, we abandon it completely and return an error. The goroutine might still be running, but we've moved on.

The channel buffers (make(chan X, 1)) prevent goroutine leaks - even if we timeout and stop listening, the goroutine can still send its result without blocking.

Pattern 5: Ticker-Based Scheduling with Graceful Stop

The scheduler runs continuously, checking prices at regular intervals:

type PriceRefresherScheduler struct {
    itemsService items.Service
    interval     time.Duration
    workerCount  int
    stopChan     chan bool
}

func NewPriceRefresherScheduler(itemsService items.Service,
    interval time.Duration, workerCount int) *PriceRefresherScheduler {

    return &PriceRefresherScheduler{
        itemsService: itemsService,
        interval:     interval,
        workerCount:  workerCount,
        stopChan:     make(chan bool),
    }
}

func (s *PriceRefresherScheduler) Start() {
    s.refreshAllPrices() // Initial run

    ticker := time.NewTicker(s.interval)

    go func() {
        for {
            select {
            case <-ticker.C:
                s.refreshAllPrices()
            case <-s.stopChan:
                ticker.Stop()
                return
            }
        }
    }()
}

func (s *PriceRefresherScheduler) Stop() {
    s.stopChan <- true
}

The select statement handles two cases:

ticker.C: Time to run another batch
stopChan: Shutdown signal received

This is intentionally simple. When Stop() is called, the scheduler stops accepting new batches immediately. In-flight scraping jobs continue until completion - we don't forcefully cancel them.

Why not wait for workers to finish?

In practice, scraping jobs complete quickly (under 15s due to our timeout). Forcefully canceling them mid-scrape creates more problems than it solves - half-written database records, resource leaks, complex cleanup logic.

The tradeoff: shutdown takes up to 15 seconds. For a background service, that's acceptable.

The Scraper: Keeping It Simple

The actual scraping logic is deliberately simple:

type Scraper struct {
    client *http.Client
}

func NewScraper() *Scraper {
    return &Scraper{
        client: &http.Client{
            Timeout: 10 * time.Second,
        },
    }
}

func (s *Scraper) ScrapePrice(url string) (*PriceInfo, error) {
    req, err := http.NewRequest("GET", url, nil)
    if err != nil {
        return nil, fmt.Errorf("failed to create request: %w", err)
    }

    req.Header.Set("User-Agent",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...")

    resp, err := s.client.Do(req)
    if err != nil {
        return nil, fmt.Errorf("failed to fetch page: %w", err)
    }
    defer resp.Body.Close()

    if resp.StatusCode != 200 {
        return nil, fmt.Errorf("bad status code: %d", resp.StatusCode)
    }

    doc, err := goquery.NewDocumentFromReader(resp.Body)
    if err != nil {
        return nil, fmt.Errorf("failed to parse HTML: %w", err)
    }

    priceText := s.extractPrice(doc)
    if priceText == "" {
        return nil, fmt.Errorf("item out of stock")
    }

    price, err := s.parsePrice(priceText)
    if err != nil {
        return nil, fmt.Errorf("failed to parse price: %w", err)
    }

    return &PriceInfo{
        Price:   price,
        InStock: true,
    }, nil
}

func (s *Scraper) extractPrice(doc *goquery.Document) string {
    // Amazon's split price format
    whole := doc.Find(".a-price-whole").First().Text()
    fraction := doc.Find(".a-price-fraction").First().Text()

    if whole != "" {
        whole = strings.ReplaceAll(strings.TrimSpace(whole), ".", "")
        if fraction != "" {
            fraction = strings.TrimSpace(fraction)
            return whole + "." + fraction
        }
        return whole
    }

    return ""
}

func (s *Scraper) parsePrice(priceText string) (float64, error) {
    priceText = strings.TrimSpace(priceText)
    priceText = strings.ReplaceAll(priceText, "$", "")
    priceText = strings.ReplaceAll(priceText, "£", "")
    priceText = strings.ReplaceAll(priceText, "€", "")
    priceText = strings.ReplaceAll(priceText, ",", "")

    price, err := strconv.ParseFloat(priceText, 64)
    if err != nil {
        return 0, fmt.Errorf("failed to parse price: %w", err)
    }

    return price, nil
}

No fancy pooling, no connection reuse magic. The http.Client with a 10-second timeout handles this automatically.

For rate limiting, I rely on Caddy at the infrastructure level rather than application logic. This keeps the scraper code focused on one thing: extract price from HTML.

User-facing rate limits are handled by Caddy, preventing abuse while keeping the scraper itself simple.

What About Error Handling?

I don't retry failed scrapes. Here's why:

If a scrape fails, it's usually because:

Item is out of stock (we handle this explicitly)
Website is down (retry won't help immediately)
Rate limited (retry makes it worse)
Network timeout (already waited 15s)

Instead of complex retry logic, failed items simply stay in the database with their last known price. They'll be retried on the next scheduled batch (1 hour later).

func (s *service) RefreshPrice(ctx context.Context,
    itemID, userID, url string) error {

    log.Printf("RefreshPrice called: itemID=%s, userID=%s, url=%s",
        itemID, userID, url)

    priceInfo, err := s.scraper.ScrapePrice(url)

    if err != nil {
        if strings.Contains(err.Error(), "out of stock") {
            log.Printf("Item out of stock, setting price to 0: itemID=%s", itemID)
            _, err := s.repo.UpdateItemPrice(ctx, itemID, userID, 0, false)
            return err
        }

        return fmt.Errorf("failed to scrape price: %w", err)
    }

    log.Printf("Updating price for item %s: $%.2f, in_stock=%t",
        itemID, priceInfo.Price, priceInfo.InStock)

    _, err = s.repo.UpdateItemPrice(ctx, itemID, userID,
        priceInfo.Price, priceInfo.InStock)

    if err != nil {
        return fmt.Errorf("failed to update price: %w", err)
    }

    return nil
}

This "eventual consistency" approach is fine for price tracking. Users don't need real-time updates - they need reliable notifications over time.

Lessons Learned

Buffered channels aren't premature optimization - they're essential for decoupling producers and consumers. Use them.

Worker pools are simpler than you think - no mutexes, no wait groups in workers, just channels and goroutines.

Timeout contexts prevent disasters - wrap any IO operation that might hang. Always.

Graceful shutdown is a spectrum - you don't always need perfect cleanup. Sometimes "stop accepting work and let current jobs finish" is good enough.

Keep scrapers simple - don't prematurely optimize connection pooling or retry logic. Get it working first, optimize only when you have real metrics showing it's needed.

Leverage infrastructure for rate limiting - instead of building complex application-level rate limiting, use reverse proxies like Caddy. Simpler code, easier to tune.

The Results

With 5 workers and 1-hour intervals, Drop handles hundreds of items without breaking a sweat. Each batch completes in under 2 minutes. No timeouts, no deadlocks, no mysterious hangs.

The entire concurrency layer is under 100 lines of code. That's the real win - not clever optimizations, but simple patterns that work reliably in production.

Want to see the full code? Drop is open source: github.com/egeuysall/drop

Building a Production Web Scraper