Data Engineer for Financial AI Pipelines
Build and own the data infrastructure that feeds the bank's AI agentic platform. You will design robust, low\-latency pipelines on Databricks that ingest, transform, and serve internal banking data alongside third\-party sources such as LSEG. Data quality, lineage, and governance are paramount — AI agents are only as reliable as the data they reason over.
This role sits at the critical intersection of data engineering and AI, directly determining the quality and scope of insights agents can deliver to bankers.
The Platform Context
This role sits within the bank's enterprise AI Agentic Platform — a strategic initiative to enhance banker productivity using large language models orchestrated via AWS Bedrock and AWS Agent Core, with data served from Databricks. The platform ingests internal banking data (credit, CRM, trade, GL) alongside external sources such as LSEG, enabling AI agents to draft documents, analyse deals, synthesise research, and surface insights on demand. Security, auditability, and regulatory compliance are non\-negotiable.
Key Responsibilities
- Design and implement ELT pipelines on Databricks using Apache Spark and Delta Lake to ingest internal banking data (GL, CRM, credit, trade data) and external sources including LSEG market data feeds
- Build and maintain data contracts, schemas, and SLAs for all datasets consumed by AI agents, ensuring agents can rely on consistent, well\-defined data interfaces
- Implement data cataloguing and lineage tracking using Databricks Unity Catalog to enable agent discoverability and data trust verification
- Optimise Delta Lake tables for low\-latency AI retrieval workloads using Z\-ordering, liquid clustering, and bloom filters
- Build streaming pipelines for near\-real\-time data ingestion using Databricks Structured Streaming or Apache Kafka
- Implement data quality checks, anomaly detection, and alerting pipelines to prevent agent hallucinations caused by upstream data issues
- Collaborate with Data Governance and Compliance to enforce data classification, PII masking, and access controls at the pipeline level
- Partner with AI engineers to design data schemas and embedding pipelines optimised for RAG retrieval and vector search
- Deep proficiency in Apache Spark, Delta Lake, and Python/PySpark
- Data cataloguing and governance tools: Unity Catalog, Apache Atlas, or equivalent
- Strong data modelling for analytical and AI retrieval workloads
- Experience integrating financial data vendors (LSEG, Refinitiv, Bloomberg, or equivalent)
- Streaming architecture familiarity: Kafka, Kinesis, or Databricks Structured Streaming
- Banking data model knowledge: GL structures, trade lifecycle, credit data
- Vector database experience: Pinecone, pgvector, Chroma, or similar
- AWS data services: Glue, Lake Formation, or S3 at scale
- Direct exposure to senior banking leadership and C\-suite stakeholders
- Competitive compensation with performance\-linked bonus and long\-term incentive plan
- Hybrid working with flexibility — we trust our people to deliver
- Continuous learning budget and access to frontier AI tools and research
- A culture that values craftsmanship, intellectual honesty, and commercial impact
You will also benefit from a highly attractive benefits package offering health and dental insurance, pension, phone and other benefits. You will also have flexible work hours, with 6 weeks of vacation, and 5 care days to ensure your work\-life balance.
Interested?If you're have any questions, feel free to contact me, Nikodem Binienda on nbin@danskebank.dk, and I will answer your questions!
Dette opslag er fra indeed. Se originalopslag ↗