Design a Mobile Crash Intelligence Platform

Published:

๐ŸŽฏ Problem

You own platform tooling for multiple mobile apps. Need crash intelligence, not just raw logs.

Requirements:

  • near real-time ingestion
  • symbolication for iOS stack traces
  • regression detection by app version/build
  • ownership mapping and alert routing

๐Ÿงฑ Pipeline

flowchart LR; SDK["Crash SDK"]-->Ingest["Ingestion API"]; Ingest-->Queue["Event Queue"]; Queue-->Sym["Symbolication Service"]; Sym-->Group["Issue Grouping"]; Group-->Store["Issue Store"]; Store-->Alert["Alerting & Routing"]; Store-->Dash["Crash Dashboard"];

๐Ÿ”‘ Core Components

1) Grouping engine

  • fingerprint by normalized stack + exception type + binary UUID
  • collapse noisy variants
  • maintain issue history across app versions

2) Regression detector

  • compare crash-free users by build
  • detect sudden issue rate increase post-release
  • auto-tag likely regressions

3) Ownership routing

  • map module/namespace to team
  • create Slack/Jira alert with severity and links

Severity Model

  • P0: launch crash spike
  • P1: major flow crash with high impact
  • P2: low-frequency long-tail issue

๐Ÿšจ Hard Problems to mention

  • symbol file mismatch and delayed symbol uploads
  • duplicate reports from retry storms
  • balancing alert sensitivity vs noise

โœ… Interview Close

Show you understand:

  • observability + release workflow integration
  • data pipeline reliability
  • practical on-call usability (actionable alerts, not dashboards only)