Data Engineer
🇵🇹 Portugal | 🇬🇷 Greece | 🇵🇱 Poland | 🇪🇸 Spain | 🇨🇿 Czechia | 🇦🇱 Albania | 🇧🇬 Bulgaria | 🇪🇪 Estonia | 🇭🇺 Hungary | 🇮🇹 Italy | 🇱🇻 Latvia | 🇱🇹 Lithuania | 🇲🇩 Moldova | 🇷🇴 Romania | 🇸🇰 Slovakia
Redshift
Python
AWS
Finance
Machine Learning
Data Science
SQL
Analyst
Testing
Security Engineer
Data Engineer
from 🇵🇹 Portugal | 🇬🇷 Greece | 🇵🇱 Poland | 🇪🇸 Spain | 🇨🇿 Czechia | 🇦🇱 Albania | 🇧🇬 Bulgaria | 🇪🇪 Estonia | 🇭🇺 Hungary | 🇮🇹 Italy | 🇱🇻 Latvia | 🇱🇹 Lithuania | 🇲🇩 Moldova | 🇷🇴 Romania | 🇸🇰 Slovakia
About the project(description, duration, stage)
Join Neurons Lab as aData Engineer on a new engagement with aregulated UK & Ireland credit and lending company. The client has lifted data from multiple business entities into a newly centralized,anonymized data lake, but lacks the data-engineering depth to make it trustworthy and analytics-ready: current pipelines were assembled quickly (partly AI-assisted), and the descriptive statisticscannot yet be validated or reproduced.
You put that foundation on solid ground so the Data Science Lead can model on it with confidence — validate and re-engineer the pipelines, build theharmonization / semantic layer across entities, enforce data quality and lineage, and prepare clean, feature-ready datasets.
This is afoundational data-engineering role on a regulated data estate; data protection and reproducibility are the primary constraints on every decision.
Full-time engagement preferable.
What you'll actually do(example tasks)
Reproduce a descriptive-statistics report end-to-end so any figure traces back to raw source — closing the gap the client admitted (numbers they can't currently defend).
Profile andreconcile differing source schemas across acquired entities: map differing field names, types, encodings and business definitions for the same concept into one conformed model.
Builddbt staging → intermediate → mart models with tests; codify the harmonized definitions the Data Science Lead specifies.
WriteGreat Expectations suites (null / range / uniqueness / referential checks) and wire them into the pipeline so bad data fails loudly rather than silently corrupting analysis.
Implemententity / identity resolution (deterministic + fuzzy matching) where there is no clean shared key for the same customer or account across sources.
Implement andverify anonymization / pseudonymization (hashing / tokenization / k-anonymity) and evidence that re-identification risk is controlled for the client's IT / compliance team.
Optimize Spark / Glue jobs over tens of millions of rows — partitioning, file formats (Parquet), incremental loads, cost control.
Orchestrate withAirflow / Step Functions; build repeatable, scheduled pipelines rather than one-off scripts.
Prepareclean, documented, feature-ready datasets for the PD / delinquency models.
Documentrunbooks so the offshore team can operate the pipelines and handover takes days, not weeks; help scope onboarding of the remaining (Ireland + additional) sources.
Skills
StrongSQL andPython for large-scale data processing
AWS data stack: S3, Glue, Lake Formation, Athena / Redshift, EMR / Spark, Step Functions / Airflow
Data modeling & semantic layer (dbt or equivalent); dimensional modeling
Entity resolution / record linkage across heterogeneous sources
Data-quality & testing frameworks (Great Expectations, dbt tests) and data lineage
Anonymization / pseudonymization techniques and their analytical trade-offs
Big-data processing (Spark) with performance and cost optimization at scale
Clear written / verbal English; documents for handover and works well with a distributed team
Knowledge
GDPR fundamentals as applied to anonymized / pseudonymized financial data and UK / EU data residency
AWS Well-Architected (Analytics, Security) for BFSI
Awareness of credit / risk data structures and what downstream modeling consumers need — a plus
Experience
4+ years in data engineering, with strongAWS + Spark / SQL at scale
Demonstrated experienceharmonizing / integrating data across multiple source systems
Experience buildingvalidated, reproducible pipelines in a regulated environment (BFSI, healthcare, government) — strong plus
Comfortable stepping into amessy, partly-built data estate and bringing it up to standard
Comfortable as the sole or lead data engineer on a small (3–4 person) delivery pod






