We''re looking for a Data Engineer with hands on experience in graph databases to design, build, and optimize data pipelines and knowledge graph solutions that power advanced analytics and discovery. You''ll collaborate with data scientists, platform engineers, and product teams to model complex domains, integrate heterogeneous sources, and deliver queryable, scalable graph data products.
Graph Data Modeling Design
Design and implement property graphs and RDF/OWL-based knowledge graphs.
Develop schemas/ontologies, entity resolution and lineage strategies; define best practices for graph modeling, naming, and versioning.
Data Engineering Integration
Build and maintain ETL/ELT pipelines to ingest, cleanse, transform, and load data into graph stores from APIs, files, RDBMS, event streams.
Implement batch and streaming integrations using tools such as Airflow, dbt, Kafka/Kinesis, Spark/Flink.
Optimize data quality, deduplication, key management, and incremental upserts into graphs.
Querying APIs
Write advanced queries in Cypher, Gremlin, and/or SPARQL; tune queries and indexes for performance.
Expose graph capabilities via APIs/services (REST/GraphQL/GRANDstack) with robust governance, observability and caching.
Performance, Reliability Security
Capacity planning, clustering, backups, and high availability for graph databases.
Monitoring/alerting (e.g., Prometheus/Grafana, CloudWatch), profiling and query plan analysis.
Apply security best practices: encryption, RBAC/ABAC, least privilege, secrets management, and data masking/Pii handling.
MLOps/Analytics Enablement (nice if applicable)
Support downstream analytics and graph algorithms (PageRank, community detection, embeddings) and integrate with ML pipelines.
DevOps SDLC
Infrastructure-as-Code (Terraform, Bicep, CloudFormation), containerization (Docker, Kubernetes), and CI/CD for data/infra.
Documentation, code reviews, and contribution to data governance (catalogs, lineage, metadata).
Must have
Nice to have

Keyskills: data engineer kubernetes docker data extraction data science iam spark devops pytorch graphql cloudformation python cdc bdd airflow sla flink neo4j compliance kafka clustering terraform aws sdlc infrastructure as code