We’re a team of AI, technology, and language experts whose DNA lives in Alexa, Siri, Watson, and virtually every human language technology product on the market. Now we’re building an industry-leading knowledge management and Retrieval-Augmented Generation (RAG) platform. Our proprietary, cutting-edge natural language processing capabilities transform unstructured data into meaningful experiences that increase productivity with unmatched accuracy and speed.
The Opportunity:
The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population — across every file format and content type our customers throw at us.
We’re in the middle of a significant architectural evolution — migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience.
This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures.
The Ideal Candidate:
Is self-driven and comfortable operating with autonomy inside a structured team
Gets energized by architectural challenges, not just feature work
Has the patience and discipline to improve existing systems while building new ones
Understands that pipeline engineering is about handling the 10,000 edge cases, not just the happy path
Is motivated by the mission: building the processing backbone that makes enterprise AI accurate and reliable
Communicates well in a remote-first environment and collaborates naturally across team boundaries
In This Role You Will:
Design and build pipeline stages for our modern ingestion architecture - from document intake through embedding generation and index writing
Contribute to the design of next-generation pipeline architecture as the system evolves
Improve system stability and scale: identify bottlenecks, reduce failure rates, and build observability into every stage
Work with workflow orchestration tools to manage complex, multi-step document processing with retry logic, error handling, and state management
Handle the realities of document diversity: PDFs, HTML, Office formats, images, structured and semi-structured data - all flowing through the same pipeline
Collaborate with the Connectors team (upstream) and Retrieval team (downstream) to ensure data flows cleanly across system boundaries
Participate in the ongoing migration from legacy systems, balancing new development with operational stability
What You'll Need to Be Successful:
5+ years of software engineering experience, with meaningful time on data processing pipelines, ETL systems, or similar infrastructure
Strong proficiency in Python and/or Go
Experience with workflow orchestration tools — Temporal, Airflow, Prefect, Step Functions, or similar
Understanding of distributed systems patterns: queues, workers, backpressure, idempotency, retry strategies
Hands-on experience with Kubernetes, Docker, Terraform, and Helm
Familiarity with message brokers and event streaming (Kafka, RabbitMQ, SQS, or similar)
Comfort working across cloud providers (AWS, Azure, GCP)
Additional Information
Benefits for Full Time Employees:
- Remote first organization
- 100% Company paid Health/Dental/Vision benefits for you and your dependents
- Life Insurance, Short-term and Long-term Disability
- 401k
- Unlimited PTO
We are interested in every qualified candidate who is authorized to work in the United States. However, we are not able to sponsor or take over sponsorship of employment visas at this time.
Pryon will not consider race, religion, sex, sexual preference, or national origin in ways that violate the Nation's civil rights laws.
Diese Cookies sind für das Funktionieren der Website erforderlich und können in unseren Systemen nicht abgeschaltet werden. Sie können Ihren Browser so einstellen, dass er diese Cookies blockiert, aber dann könnten einige Teile der Website nicht funktionieren.
Sicherheit
Benutzererfahrung
Zielgruppenorientierte Cookies
Diese Cookies werden über unsere Website von unseren Werbepartnern gesetzt. Sie können von diesen Unternehmen verwendet werden, um ein Profil Ihrer Interessen zu erstellen und Ihnen an anderer Stelle relevante Werbung zu zeigen.
Google Analytics
Google Ads
Wir benutzen Cookies
🍪
Unsere Website verwendet Cookies und ähnliche Technologien, um Inhalte zu personalisieren, das Nutzererlebnis zu optimieren und Werbung zu indvidualisieren und auszuwerten. Indem Sie auf Okay klicken oder eine Option in den Cookie-Einstellungen aktivieren, stimmen Sie dem zu.
Die besten Remote-Jobs per E-Mail
Schliess dich über 5'000+ Personen an, die wöchentlich Benachrichtigungen über Remote-Jobs erhalten!