Staff Network Engineer (Menlo Park, CA or Durham, NC) #4507 at GRAIL
GRAIL · Menlo Park, United States Of America · Hybrid
- Office in Menlo Park
Responsibilities
- Staff Network Engineering - AWS and Hybrid Cloud
- AWS VPC Engineering
- Design, build, and maintain Amazon VPCs including CIDR planning, subnet design (public/private), route tables, Internet Gateways (IGW), NAT gateways, and VPC endpoints (Interface/Gateway).
- Configure and manage security controls such as Security Groups, NACLs, AWS Network Firewall, and AWS WAF for defense‑in‑depth across environments.
- Hybrid Connectivity
- Implement and support hybrid connectivity using AWS Direct Connect, Site‑to‑Site VPNs, and AWS Transit Gateway for scalable VPC‑to‑VPC and on‑prem connectivity.
- Traffic Management & DNS
- Configure Amazon Route 53 for internal and external DNS, routing policies, health checks, and failover.
- Deploy and manage Elastic Load Balancing (ALB/NLB/GLB) to provide high availability, SSL termination, path‑based routing, and/or TCP/UDP load balancing.
- On‑Prem & Data Center Networking
- Operate and troubleshoot on‑prem and data center networks using Juniper and Aruba platforms (switching, routing, VLANs, VRFs, BGP/OSPF).
- Configure, manage, and tune Palo Alto Networks firewalls, including security policies, NAT, VPN, and content inspection.
- Monitoring, Logging & Dashboards
- Design and implement end‑to‑end monitoring, alerting, and dashboards for network health, performance, and security, leveraging tools such as:
- VPC Flow Logs, CloudWatch metrics/logs, and Route 53 health checks.
- Firewall logs and on‑prem device telemetry.
- Build and maintain dashboards for:
- Link utilization, latency, packet loss, and error rates (DX, VPN, TGW, campus links).
- Load balancer health, connection metrics, and capacity.
- DNS performance and resolution issues.
- Establish actionable alerting thresholds and runbooks to support rapid incident triage and resolution.
- Capacity Planning & Performance
- Perform ongoing capacity planning for AWS networking (VPCs, TGW, DX, VPN, load balancers) and on‑prem links, forecasting growth and identifying bottlenecks.
- Analyze traffic patterns and utilization data to right‑size connectivity, optimize routing, and plan upgrades before they become constraints.
- Run performance tests and baselines (throughput, latency, failover behavior) and tune configurations accordingly.
- Incident Response & Troubleshooting
- Lead network‑related incident response, including real‑time troubleshooting across layers (DNS, TCP/IP, TLS, HTTP, internal app protocols).
- Drive root‑cause analysis (RCA) and implement corrective and preventive actions (runbooks, automation, design changes).
- Architecture & Design (Significant Component)
- Own end‑to‑end network architecture for multi‑account, multi‑region AWS environments, ensuring scalability, reliability, observability, and security.
- Develop and maintain network reference architectures and patterns for:
- Isolated and regulated environments.
- Service‑to‑service connectivity using PrivateLink, VPC peering, and/or VPC Lattice.
- Ingress/egress patterns through ELB, Global Accelerator, and centralized egress VPCs.
- Design application connectivity, segmentation, and zero‑trust network patterns in partnership with Security and Platform teams.
- Evaluate and introduce advanced AWS networking capabilities (e.g., AWS App Mesh, Amazon VPC Lattice, AWS Global Accelerator) where they provide clear operational or performance benefits.
- Ensure architectural designs explicitly include observability and capacity planning requirements (telemetry, KPIs, SLOs).
- Automation, Tooling & Governance
- Build and maintain infrastructure‑as‑code for network components (e.g., Terraform/CloudFormation modules for VPCs, TGWs, Direct Connect, routing, firewall rules).
- Integrate network provisioning and configuration into CI/CD pipelines to support safe, auditable, and repeatable deployments.
- Automate generation and updates of network monitoring, logging, and dashboard configurations where possible.
- Define and codify network standards, guardrails, and best practices for AWS and on‑prem networking, including monitoring and capacity baselines.
- Partner with Security and Compliance to ensure designs and implementations meet regulatory and internal policy requirements, including logging and retention requirements.
- Collaboration & Leadership
- Act as the primary subject matter expert for AWS networking, hybrid connectivity, and network observability, providing guidance to platform, SRE, security, and application teams.
- Mentor other engineers on networking fundamentals, AWS networking, performance troubleshooting, and effective monitoring/dashboards.
- Lead and review technical designs, RFCs, and architectural decisions for network‑related projects.
- Communicate complex networking concepts, trade‑offs, and capacity risks to both technical and non‑technical stakeholders.
Required Qualifications
- 10+ years of experience in network engineering, with at least several years in a senior/staff or architecture‑oriented role.
- Deep, hands‑on experience with AWS networking:
- Amazon VPC (CIDR design, subnets, IGW/NAT, route tables, endpoints).
- Security Groups and NACLs.
- AWS Transit Gateway, Site‑to‑Site VPN, and AWS Direct Connect.
- Route 53 and ELB (ALB/NLB/GLB).
- Strong enterprise/data center networking experience:
- Juniper and/or Aruba networking platforms.
- Routing/switching (BGP, OSPF, VLANs, VRFs, link aggregation, redundancy protocols).
- Hands‑on experience with Palo Alto Networks firewalls (policy management, NAT, VPN, content inspection).
- Demonstrated experience setting up monitoring, logging, and dashboards for network infrastructure (cloud and on‑prem), and using this data for incident response and capacity planning.
- Proven track record building and operating secure, highly available, and scalable network infrastructures in production.
- Solid understanding of network security principles, segmentation, and zero‑trust concepts.
- Strong troubleshooting skills across layers (DNS, TCP/IP, TLS, HTTP, internal app protocols).
- Excellent communication skills and experience working in cross‑functional, fast‑moving environments.
Preferred Qualifications
- Experience in healthcare, life sciences, or other highly regulated or security‑sensitive environments.
- Experience with:
- AWS Network Firewall, AWS WAF.
- AWS App Mesh and/or Amazon VPC Lattice.
- AWS Global Accelerator and edge networking patterns.
- Proficiency with infrastructure‑as‑code (e.g., Terraform, CloudFormation) and automation/scripting (Python, Bash, PowerShell, etc.).
- Experience designing SLOs, KPIs, and alerting strategies for network reliability and performance.
- Familiarity with SD‑WAN, SASE, and/or Zero Trust Network Access (ZTNA) solutions.
- Relevant certifications such as AWS Certified Advanced Networking – Specialty, CCNP/CCIE, or Palo Alto Networks certifications.