Case Studies
Managed Services HealthTech

Upheal

24/7 Managed AWS Operations During Rapid Growth

15 min
Off-hours response
24/7
Managed operations
Tested
Disaster recovery
SOC2
Compliance ready

The Challenge

Upheal, an AI-powered mental health platform, was experiencing rapid user growth. Their platform processes sensitive therapy session data, generating AI-assisted clinical notes and insights for mental health practitioners. As adoption accelerated, the operational demands on their AWS infrastructure grew faster than their small engineering team could handle.

The core problems were interconnected:

  • No dedicated DevOps capacity: Upheal’s engineering team was fully committed to product development. Infrastructure incidents during off-hours went unaddressed until the next business day, risking data availability for practitioners relying on the platform during patient sessions.
  • Compliance pressure: As a HealthTech company handling protected health information, Upheal needed to demonstrate SOC2 compliance to enterprise customers. This required documented operational procedures, access controls, audit trails, and regular security reviews that the team had no bandwidth to implement.
  • Untested disaster recovery: While basic backups existed, disaster recovery procedures had never been formally documented or tested. Given the sensitivity of clinical data, any data loss scenario posed both a business and ethical risk.
  • Scaling uncertainty: Traffic patterns were becoming less predictable as Upheal expanded into new markets. The team lacked visibility into infrastructure performance trends and capacity planning data.

Upheal needed a managed operations partner who understood both AWS infrastructure and the compliance requirements of the healthcare technology space.

The Solution

Remangu implemented a comprehensive managed operations framework tailored to Upheal’s specific needs as a fast-growing HealthTech company.

24/7 Monitoring and Incident Response

We deployed a multi-layered monitoring architecture using CloudWatch as the foundation. Custom metrics were defined for application-level health indicators beyond standard infrastructure metrics, including API response times for clinical note generation, queue depths for AI processing pipelines, and database connection pool utilization.

Alert routing was integrated directly with Upheal’s Slack workspace, providing engineers with real-time visibility into infrastructure status without requiring them to monitor dashboards. Critical alerts triggered Remangu’s on-call rotation with a guaranteed 15-minute response time during off-hours.

The incident response process followed a structured runbook approach:

  • Automated triage classified incidents by severity and impact
  • Predefined remediation procedures covered the most common failure modes
  • Escalation paths ensured that complex issues reached senior engineers quickly
  • Post-incident reviews were conducted for all Severity 1 and 2 events

Compliance-Ready Operations

Working toward SOC2 readiness required systematic changes to how infrastructure was managed. We implemented:

Access management through AWS IAM with role-based access controls, enforced MFA for all human access, and automated access reviews on a quarterly cadence. Service accounts were scoped to minimum required permissions and rotated on schedule.

Audit logging was centralized using CloudTrail with logs stored in tamper-proof S3 buckets with object lock enabled. Log retention policies were configured to meet SOC2 requirements, and automated analysis flagged anomalous access patterns.

Change management processes were formalized with all infrastructure changes deployed through version-controlled pipelines. Manual changes were restricted to break-glass scenarios with mandatory post-change documentation.

Evidence collection was automated wherever possible, generating compliance artifacts that Upheal could provide directly to auditors without engineering involvement.

Disaster Recovery Testing

We designed and documented comprehensive disaster recovery procedures covering every critical component of Upheal’s infrastructure. RDS automated backups were configured with point-in-time recovery enabled and cross-region replication for the primary database. S3 data was protected with versioning and cross-region replication for critical buckets.

Critically, we moved beyond documentation to regular testing. Quarterly DR exercises simulated various failure scenarios including database failover, region-level outages, and data corruption recovery. Each exercise produced a detailed report documenting recovery time, data integrity verification, and any gaps identified in the procedures.

Proactive Capacity Management

Monthly operational reviews analyzed infrastructure utilization trends and projected capacity needs based on Upheal’s growth trajectory. These reviews identified optimization opportunities, such as right-sizing RDS instances and adjusting reserved instance coverage, that kept costs aligned with actual usage while maintaining headroom for growth.

The Results

Remangu’s managed operations engagement delivered stability and confidence across Upheal’s infrastructure.

15-minute off-hours incident response became the operational standard rather than an aspiration. Over the first six months of the engagement, mean time to acknowledge dropped from hours to under 10 minutes, and mean time to resolve for common incidents decreased by 70% through runbook automation.

24/7 managed operations freed Upheal’s engineering team entirely from infrastructure on-call responsibilities. Engineering velocity on product features increased measurably as context-switching between product work and infrastructure firefighting was eliminated.

Tested disaster recovery gave Upheal verifiable confidence in their data protection posture. Quarterly DR exercises consistently demonstrated recovery point objectives under 1 hour and recovery time objectives under 4 hours for full environment restoration. Two real incidents during the engagement period were resolved using the tested procedures without data loss.

SOC2 compliance readiness was achieved within four months of engagement start. The systematic approach to access controls, audit logging, and change management produced a compliance posture that passed pre-audit assessment, enabling Upheal to pursue enterprise customer contracts that required SOC2 attestation.

Tech Stack

CloudWatch AWS IAM Slack Integration DR Automation S3 RDS

Having Remangu manage our AWS operations meant our engineers could focus entirely on building the product. Their response times and proactive approach gave us confidence that our infrastructure was in expert hands.

Andre Lampe

Co-founder, Upheal

Similar Challenge?

Let's discuss how we can help your team achieve similar results.

Talk to an Expert