Wikimedia Data Engineer Role

New

Skills

Airflow CI/CD Engineer Hadoop hive Java Kafka Python Scala Spark SQL

Join the Wikimedia Foundation's Data Platform Engineering team to help power Wikipedia and its sister projects, which reach billions of users worldwide. As a Data Engineer, you will play a vital role in unifying and scaling data systems, delivering robust data infrastructure, and supporting analytics, research, and AI development. This fully remote position offers the opportunity to make an impact on global open knowledge accessibility while collaborating with a distributed and diverse team.

Job Overview
  • Design and develop scalable data pipelines and infrastructure using modern data engineering tools.
  • Support the Wikimedia Foundation's mission to freely share knowledge with every human being.
  • Collaborate across teams to improve the reliability, performance, and maintainability of the data platform.
  • Contribute to open data products and tools used by both internal teams and the global public.
Key Responsibilities
  • Design and build robust data pipelines with tools such as Airflow, Spark, and Kafka.
  • Implement monitoring and alerting systems to ensure data quality and reliability.
  • Support data governance and lineage initiatives for data management and compliance.
  • Collaborate with engineering peers to enable advanced analytics, feature development, and AI applications.
  • Continuously identify and implement operational improvements for scalability and performance.
Required Skills & Qualifications
  • 3+ years of data engineering experience, including on-premise systems (Spark, Hadoop, HDFS).
  • Proficiency in Python or Java/Scala and working knowledge of related development tools.
  • Hands-on experience with data pipeline tools such as Airflow, Kafka, Spark, and Hive.
  • Strong understanding of SQL and multiple database/query dialects (e.g., MariaDB, HiveQL, Presto).
  • Experience with troubleshooting, system design, and optimizing for performance and scaling.
  • Familiarity with CI/CD practices and software containerization.
  • Excellent communication and collaboration skills within distributed teams.
  • Bonus: Exposure to technologies such as Kubernetes, Flink, Iceberg, Druid, Cassandra.
  • Bonus: Knowledge of AI development tooling and applications in engineering.
  • Commitment to Wikimedia's values of inclusivity, openness, and free knowledge sharing.

No forms. Your profile is generated instantly.

Job Type: Remote

Salary: Not Disclosed

Experience: Entry

Duration: 12.0 Months

Share this job:

Similar Jobs

Forward Deployed Software Engineer

Posted 12 days ago

Lead technical implementation and optimization of data platform

Serve as primary technical contact for key accounts

Airflow AWS Azure Databricks

Data Lead: ETL & Analytics

Posted 12 days ago

Improve data infrastructure Optimize performance and accessibility Enable data-driven

g Collaborate with cross-functional teams Mentor and lead data

Airflow Analytics AWS BigQuery

Staff ML Engineer, Apollo

Posted 12 days ago

Lead development of scalable ML systems

Advance Apollo's AI-native product features

Airflow Architecture Databricks Engineer

Senior ML Engineer II at Apollo

Posted 12 days ago

Build and productionize Machine Learning models for Apollo products

Optimize users' experience at all stages of their product journey

Airflow Ai Systems Cloud Computer science

ML Engineer on Apollo Team

Posted 12 days ago

Build and deploy ML models for Apollo products.

Enhance user experience through data-driven insights.

Airflow Cloud Computer science Databricks

AI Platform Lead Engineer

Posted 12 days ago

Design scalable AI data platforms

Optimize ML pipeline efficiency and resource allocation

Airflow BigQuery Dataflow Pyspark

Python Kubernetes Engineer

Posted 12 days ago

Hiring Python and Kubernetes Engineers for Data, Workflows, AI/ML, and Analytics solutions

Collaborating on end-to-end data analytics using open-source tools

Agile Development Airflow Analytics Cloud Computing

Senior Analytics Engineer

Posted 12 days ago

Hiring a Senior Analytics Engineer remotely

Axios - Smart brevity

Airflow Analytics BigQuery Bi tools

Backend Engineer - Data

Posted 12 days ago

Building APIs and backend systems for products

Integrating with third-party systems

Agile Agile Methodologies Airflow Ai Tools

Senior Data Engineer Project

Posted 12 days ago

Develop data infrastructure and systems for various business functions

Implement data observability and monitoring

Airflow Big Data Data Security Devops

Crypto Data Engineer Platform

Posted 12 days ago

Architect scalable data pipelines and infrastructure.

Enable real-time, reliable, and high-quality data access.

Airflow Big Data Engineer Kafka

Data Engineer Crypto Platform

Posted 12 days ago

Design scalable data infrastructure

Build and maintain high-quality data pipelines

Airflow Big Data Engineer Kafka

Compliance Data Analyst Role

Posted 12 days ago

Develop and automate compliance dashboards and reports

Support regulatory reporting and audit readiness

Airflow AWS Data Analysis Data Analyst

Generative ML Engineer Role

Posted 12 days ago

Design and scale generative AI infrastructure

Develop and fine-tune generative video and visual models

Airflow Engineer Machine Learning Prompt Engineering

Lead Data Engineer Role

Posted 12 days ago

Architect scalable, secure data platforms.

Implement modern software engineering practices.

Airflow AWS Aws glue Azure

AI Engineer - ML Ops

Posted 12 days ago

* Drive optimization in supply chain and manufacturing sector * Collaborate with cross-functional

ams to build high-quality product features * Deploy AI models to solve complex global problems *

Airflow BigQuery Python Pytorch

Staff Software Engineer

Posted 12 days ago

Revolutionize enterprise data operations through AI solutions.

Automate and accelerate data tasks for overworked data teams.

Ai Airflow Ansible Api Development

Principal Solution Architect

Posted 12 days ago

Propose, design, and provision cloud-native data solutions.

Lead a technical team operating modern data platforms.

Airflow AWS Azure Cloud

Senior Solutions Architect Role

Posted 12 days ago

Design and implement scalable data architectures

Lead and mentor engineering teams

Airflow AWS Azure Databricks

Cloud Data Solutions Architect

Posted 12 days ago

Design and implement scalable cloud-native data platforms

Optimize and automate data platform performance

Airflow Architecture AWS Azure

Technical Delivery Manager

Posted 12 days ago

Lead and motivate client technical teams for modern data platforms

Maintain knowledge of modern data technology for best practices

Airflow Architecture AWS Azure

Senior ML Engineer, Ads

Posted 12 days ago

Lead end-to-end ML ad targeting product development

Drive technical research and strategic roadmap

Airflow BigQuery Deep Learning Machine Learning

Senior Data Engineer Role

Posted 12 days ago

Design and maintain scalable data pipelines

Ensure data quality, reliability, and performance

Airflow Apache Kafka AWS Azure

Health Insurance Data Analyst

Posted 12 days ago

Conduct SQL analysis for actionable insights, Maintain and optimize ML models, Analyze unstructured

logs, Develop ETL pipelines, Collaborate with engineering

Airflow Data Analysis Data Visualization Etl

Data Engineering Lead

Posted 12 days ago

- Manage platform APIs and AI capabilities - Oversee data system scalability and performance -

borate with data science and product teams - Implement AI and ML models into the platform - Ensure

Ai/ml Airflow AWS BigQuery

BI Analyst – Sporty Group

Posted 12 days ago

Deliver actionable business insights

Develop and optimize data pipelines

Ab testing A/b Testing Airflow AWS

BI Analyst – Sports Betting

Posted 12 days ago

Mine and analyze large-scale business data

Develop and maintain dashboards and reports

Ab testing A/b Testing Airflow Analyst

Senior Data Engineer Wikimedia

Posted 12 days ago

Design and maintain scalable data pipelines

Ensure data quality and governance

Airflow Docker Engineer Hadoop

Remote Analytics Engineer Jobs

Posted 12 days ago

Enhance data pipelines and models, Drive data-driven decision-making, Collaborate with

teams, Optimize data infrastructure, Analyze product

Airflow AWS Cloud Etl

Remote Senior Analytics Engineer Jobs

Posted 12 days ago

- Develop data pipelines and transform data - Optimize data infrastructure for decision-making -

lyze product data and improve solutions - Enhance mental healthcare through data insights - Drive

Airflow AWS BigQuery Etl

Analytics Engineer for Real Estate

Posted 12 days ago

- Design and implement Data Pipelines with platform services and serverless solutions - Develop and

test ingestion pipelines from various sources - Create data transformations with SQL, Python, PaaS,

Airflow apache Etl Python

Data Engineer

Posted 12 days ago

Design and develop infrastructure and tools for data systems; Generalize data points for multi-dimensional data stores; Build analytics lakehouse; Translate stakeholder requirements to solutions; Champion agile software development practices

None

Airflow AWS Postgres Python

Data Engineering Manager

Posted 12 days ago

Own foundational data artifacts for the business domain Mentor, coach, and advocate for team

Design and build scalable data pipelines Contribute to data architecture and governance Ensure

Airflow Data Modeling Data Warehousing Kafka

Remote AWS Developer Jobs

Posted 12 days ago

. Reduce emissions through technology innovation

. Collaborate with global enterprises

Airflow AWS Cloud devsecops

Remote UK Skilled Worker Visa Jobs

Posted 12 days ago

Build infrastructure software for data platforms, mentor engineers, provide HR support, collaborate

internationally in forensic accounting, hire exceptional talent, drive future

Airflow apache Docker Kubernetes

AI Data Engineer

Posted 12 days ago

- Build production-grade data pipelines - Collaborate with cross-functional teams - Take on new

lenges and responsibilities - Shape company culture - Solve real-world complex

Ai Frameworks Airflow BigQuery Python

OpenSC Remote Jobs

Posted 12 days ago

- Enhance data solutions for sustainable food systems - Lead customer onboarding and supply chain

implementations - Transform sustainability goals into actionable solutions - Drive product

Airflow AWS Cloud Docker

Staff Analytics Engineer

Posted 12 days ago

Enable informed decision-making through accessible data

Lead data vision and architecture for impactful insights

Airflow Data Engineering Data Modeling Etl

Data Sales Automation Platform

Posted 12 days ago

Enhancing media sales innovation through automation and intelligent proposals

Driving growth and optimization for media companies and agencies

Airflow Data Modeling Data Warehousing Etl Processes

Junior Data Engineer Addepto

Posted 12 days ago

Develop scalable data processing platforms

Design and optimize data pipelines

Airflow AWS Docker Java

Senior Data Engineer Africa

Posted 12 days ago

Ensure high-quality, reliable data management.

Automate data quality assurance processes.

Airflow AWS Lambda Postgresql

Junior Data Scientist Germany

Posted 12 days ago

Deliver actionable business insights

Collaborate across cross-functional teams

Airflow AWS Looker Pandas

Senior Data Science Manager

Posted 12 days ago

Lead and mentor a data science team

Integrate analytics into business strategy

Airflow Amplitude AWS Databricks

Data Governance Engineering Lead

Posted 12 days ago

Lead global data governance strategy and execution

Build and maintain governance-aware data pipelines

Airflow Architecture Databricks Infosec

Data Engineer, Solar Solutions

Posted 12 days ago

Enable data-informed decision-making organization-wide

Design and implement scalable cloud-based ETL/ELT solutions

Airflow Lambda NoSQL Pyspark

Developer Advocate

Posted 12 days ago

Empower developers through technical communication and content creation.

Engage actively in the DataHub community to support users.

Airflow Apache Kafka Communication Community engagement

Python & Kubernetes Software Engineer

Posted 12 days ago

Hiring Python and Kubernetes Specialist Engineers for Data, AI/ML & Analytics Solutions

Building open source solutions for public cloud and private infrastructure

Airflow Analytics Data Analytics Docker