Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Vector Database & Embedding Engineer RAG Pipeline Development @ Opentext

Home > Software Development






 Vector Database & Embedding Engineer RAG Pipeline Development

Job Description

Job Summary

We are seeking an experienced Vector Database & Embedding Engineer to design, build, and optimize vector search pipelines, embedding workflows, and chunking strategies for enterprise Retrieval-Augmented Generation (RAG) systems.

This role requires deep hands-on experience with vector DBs (pgvector, Pinecone, Chroma, Milvus, Weaviate), embedding models (OpenAI, HuggingFace, Instructor, FlagEmbedding, BGE, etc.), and robust chunking/indexing pipelines for structured/unstructured data.

You will collaborate with LLM engineers, graph engineers, backend teams, and product owners to deliver high-accuracy, high-recall retrieval systems for AI applications.

Key Responsibilities

1. Vector Database Design & Management

  • Setup, configure and manage vector DBs such as:
    • pgvector, FAISS, Pinecone, Weaviate, Chroma, Milvus
  • Design schemas for:
    • Multi-embedding storage
    • Metadata storage
    • Document-level and chunk-level indexing
  • Implement filtering, similarity search, MMR, reranking, and index optimization.

2. Embedding Pipeline Development

  • Select, fine-tune, or run embedding models such as:
    • Sentence-BERT, BGE, GTE, Instructor, FlagEmbedding
    • OpenAI Embeddings / Azure OpenAI
    • HuggingFace Transformers
  • Build:
    • Batch embedding pipelines
    • Real-time embedding APIs
    • Multi-encoder architecture for hybrid search
  • Evaluate embedding quality, dimensionality, and vector drift.

3. Chunking, Indexing & Document Processing

  • Design advanced chunking strategies:
    • Fixed window chunking
    • Sliding window
    • Semantic chunking
    • Layout-aware chunking (tables, lists, multi-column)
  • Extract content from:
    • PDFs, HTML pages, Office files, emails, scanned docs
  • Build a complete indexing pipeline:
    • Preprocessing Chunking Embedding Vector DB upsert Metadata linking

4. RAG Optimization & Retrieval Tuning

  • Optimize retrieval for:
    • Accuracy
    • Latency
    • Recall / diversity
  • Implement hybrid search:
    • Vector + Keyword
    • Vector + Graph (GraphRAG)
  • Build ranking stacks using rerankers (Cross-Encoders).

5. Backend & API Development

  • Build APIs for:
    • Document ingestion
    • Embedding generation
    • Retrieval & context merging
  • Serve embedding + vector workflows using Python/FastAPI or Node.js.
  • Integrate vector search with LLM prompt templates.

6. Monitoring, Evaluation & Scaling

  • Evaluate retrieval metrics (pr******n@*, re***l@*, MRR).
  • Implement observability for indexing, failures, and accuracy degradation.
  • Scale vector DBs horizontally & vertically based on dataset size.

7. Collaboration & Documentation

  • Work with LLM engineers to design end-to-end RAG pipelines.
  • Maintain documentation for:
    • Embedding configs
    • Chunking logic
    • Vector schemas
    • Retrieval settings
  • Train internal teams on best practices.

Required Technical Skills

Vector Databases

  • Strong hands-on with:
    • pgvector (must-have for enterprise)
    • Pinecone, Chroma, Weaviate, Milvus, or FAISS
  • Deep knowledge of:
    • Index types (HNSW, IVFFlat, PQ, IVF-PQ)
    • Similarity metrics (cosine, dot, euclidean)
    • Index tuning (ef_search, ef_construction, cluster size)

Embeddings

  • Experience generating and evaluating embeddings using:
    • OpenAI / Azure OpenAI
    • InstructorXL, BGE, GTE, FlagEmbedding
    • Sentence-BERT / HF embeddings
  • Knowledge of:
    • Embedding dimensionality
    • Tokenization & vector normalization
    • Multi-embedding pipelines

Chunking & Preprocessing

  • Strong experience with document processing libraries:
    • PDFPlumber, PyMuPDF, Textract, Tika
  • Designing chunking strategies for:
    • PDFs
    • Web pages
    • Product catalogs
    • Emails & logs
  • Metadata creation and linking strategies.

Backend / Engineering

  • Python (preferred), Node.js
  • FastAPI / Flask
  • SQL & NoSQL
  • ETL pipelines (Airflow / custom)
  • Docker, Linux environments

Experience Required

  • Total Experience: 26 years
  • Relevant Vector Search / Embedding Experience: 13 years
  • Experience in building real RAG systems (highly preferred).

Preferred Skills

  • Knowledge of:
    • LangChain or LlamaIndex
    • Rerankers (Cross-Encoders)
    • Hybrid retrieval
    • Graph + Vector hybrid search
  • Experience in:
    • OCR processing
    • Data extraction
    • Enterprise search systems
  • Familiarity with:
    • RedisSearch
    • ElasticSearch vector search

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Tenth Planet
Location(s): Chennai

+ View Contactajax loader


Keyskills:   embedding Retrieval Augmented Generation Vector

 Job seems aged, it may have been expired!
 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

Site Reliability Engineer

  • Capgemini
  • 4 - 8 years
  • Mumbai
  • 14 hours ago
₹ Not Disclosed

.Net Core - Engineering Manager / Technical Manager - Hyderabad

  • Wipro
  • 10 - 20 years
  • Hyderabad
  • 24 hours ago
₹ Not Disclosed

Embedded C++ Application Development

  • Cognizant
  • 8 - 13 years
  • Hyderabad
  • 1 day ago
₹ Not Disclosed

Sr. Software Engineer B2

  • Cognizant
  • 6 - 10 years
  • Bengaluru
  • 2 days ago
₹ Not Disclosed

Opentext

Open Textâ„¢ is the world's largest independent provider of Enterprise Content Management software. The Company's solutions manage information for all types of business, compliance and industry requirements in the world's largest companies, government agencies and professional service firms. Ope...