As an AI Engineer specializing in Agentic AI enablement, you will design, prototype, iterate, and productionize reusable agent capabilities that run on the enterprise AI Backbone across cloud and edge environments. You will build and harden agent behaviors (tool-use, policy constraints, memory/RAG patterns where applicable), create evaluation and regression test harnesses, and integrate agents with enterprise systems using MCP-style connectors/clients. You serve as the technical complement to the AI Solutions Lead by translating domain workflows into reliable agent components, while partnering closely with platform teams to deploy using standardized CI/CD, security, and observability patterns.
Responsibilities1) Agent Engineering & Productionization (40%)
- Prototype, iterate, and productionize domain-aligned agent modules (plans, tool-use, task execution flows) that operate reliably within defined workflows. (Execute)
- Build and maintain versioned agent assets (prompts, policies, tool schemas, configs) with clear change logs and reproducibility. (Execute)
- Optimize agent performance for latency and token efficiency within defined constraints (especially for edge-targeted scenarios when applicable). (Execute)
2) Evaluation, Testing & Quality Signals (25%)
- Implement an AI system testing harness for assigned agents: regression suites, golden test sets (where applicable), and comparison reports for prompt/model variants. (Execute)
- Maintain evaluation metadata (test versioning, metrics, correlation IDs) to support traceability and repeatability. (Execute)
- Contribute to safety/quality checks (hallucination, toxicity, policy compliance) as part of evaluation workflows defined by the program. (Execute/Consult)
3) Integration with Tools and MCPs (20%)
- Implement or extend MCP clients/connectors for internal data products and approved enterprise apps using standardized interfaces, scopes, and audit patterns. (Execute)
- Validate integration behaviour with sandbox credentials, representative test data, and end-to-end workflow tests with stakeholders. (Execute/Consult)
4) Operational Readiness & Collaboration (15%)
- Ensure owned components meet operational readiness expectations: logging/telemetry coverage, runbook notes, basic SLI/SLO alignment for agent health and integration reliability. (Execute/Consult)
- Collaborate with platform and transformation teams to clarify requirements, triage issues, and incorporate feedback from internal/external teams into improvements. (Execute/Consult/Informed)
- Identify and implement small process improvements that increase repeatability (evaluation templates, prompt versioning conventions, integration test scaffolds). (Execute)
Decision-Making Autonomy: Moderate — owns technical implementation for assigned agents/evals/integrations within established patterns; escalates cross-domain/security/policy decisions.Supervision Required: Moderate — receives design review and direction from L09/L10 AI leads for evaluation approach, routing standards, and sensitive integrations.Complexity of Role: High (for L08) — requires balancing quality/latency, integrating multiple enterprise tools, and ensuring reproducible evaluation under evolving requirements.Cross-Functional Interactions: Yes — frequent interactions with platform, product/domain, security, SRE/observability, and enterprise app owners.
QualificationsMinimum Qualifications
- Bachelor’s/Master’s in CS/AI/ML/Data Science (or equivalent experience).
- Hands-on experience building LLM applications (agents/tool-use/prompting) and shipping production code in Python.
Required Expertise
- Python engineering with production hygiene (testing, packaging, structured logging)
- Agentic AI frameworks/patterns: LangGraph/LangChain, CrewAI-style orchestration patterns; tool/function calling; prompt versioning
- Evaluation discipline: test sets, regression testing, offline eval metrics, A/B comparisons, failure taxonomy
- Integration engineering: APIs, auth concepts, schema-based tool integration; MCP-style interface implementation preferred
- Observability basics: correlation IDs, error analysis, latency instrumentation
- Cloud familiarity: enough to deploy and validate agents via platform pipelines (not owning infra)
Differentiating Competencies
- Ownership: takes components from prototype → tested → production-ready with clear artifacts
- Process improvement mindset: improves repeatability and reduces rework through templates and automation
- Collaboration & customer focus: works effectively with domain teams; builds what improves real workflows
- Adaptability: adjusts quickly to changing model/tool constraints and evolving requirements
- Communication: concise technical updates; can explain agent behaviour and evaluation results to non-experts
