Vector Database & Embedding Engineer RAG Pipeline Development @ Opentext

Home > Software Development

Vector Database & Embedding Engineer RAG Pipeline Development

Opentext
3 - 8 years
Chennai
4 months ago
Email to a friend
Report this job

Job Description

Job Summary

We are seeking an experienced Vector Database & Embedding Engineer to design, build, and optimize vector search pipelines, embedding workflows, and chunking strategies for enterprise Retrieval-Augmented Generation (RAG) systems.

This role requires deep hands-on experience with vector DBs (pgvector, Pinecone, Chroma, Milvus, Weaviate), embedding models (OpenAI, HuggingFace, Instructor, FlagEmbedding, BGE, etc.), and robust chunking/indexing pipelines for structured/unstructured data.

You will collaborate with LLM engineers, graph engineers, backend teams, and product owners to deliver high-accuracy, high-recall retrieval systems for AI applications.

Key Responsibilities

1. Vector Database Design & Management

Setup, configure and manage vector DBs such as:

pgvector, FAISS, Pinecone, Weaviate, Chroma, Milvus

Design schemas for:

Multi-embedding storage
Metadata storage
Document-level and chunk-level indexing

Implement filtering, similarity search, MMR, reranking, and index optimization.

2. Embedding Pipeline Development

Select, fine-tune, or run embedding models such as:

Sentence-BERT, BGE, GTE, Instructor, FlagEmbedding
OpenAI Embeddings / Azure OpenAI
HuggingFace Transformers

Build:

Batch embedding pipelines
Real-time embedding APIs
Multi-encoder architecture for hybrid search

Evaluate embedding quality, dimensionality, and vector drift.

3. Chunking, Indexing & Document Processing

Design advanced chunking strategies:

Fixed window chunking
Sliding window
Semantic chunking
Layout-aware chunking (tables, lists, multi-column)

Extract content from:

PDFs, HTML pages, Office files, emails, scanned docs

Build a complete indexing pipeline:

Preprocessing Chunking Embedding Vector DB upsert Metadata linking

4. RAG Optimization & Retrieval Tuning

Optimize retrieval for:

Accuracy
Latency
Recall / diversity

Implement hybrid search:

Vector + Keyword
Vector + Graph (GraphRAG)

Build ranking stacks using rerankers (Cross-Encoders).

5. Backend & API Development

Build APIs for:

Document ingestion
Embedding generation
Retrieval & context merging

Serve embedding + vector workflows using Python/FastAPI or Node.js.
Integrate vector search with LLM prompt templates.

6. Monitoring, Evaluation & Scaling

Evaluate retrieval metrics (pr******n@*, re***l@*, MRR).
Implement observability for indexing, failures, and accuracy degradation.
Scale vector DBs horizontally & vertically based on dataset size.

7. Collaboration & Documentation

Work with LLM engineers to design end-to-end RAG pipelines.
Maintain documentation for:

Embedding configs
Chunking logic
Vector schemas
Retrieval settings

Train internal teams on best practices.

Required Technical Skills

Vector Databases

Strong hands-on with:

pgvector (must-have for enterprise)
Pinecone, Chroma, Weaviate, Milvus, or FAISS

Deep knowledge of:

Index types (HNSW, IVFFlat, PQ, IVF-PQ)
Similarity metrics (cosine, dot, euclidean)
Index tuning (ef_search, ef_construction, cluster size)

Embeddings

Experience generating and evaluating embeddings using:

OpenAI / Azure OpenAI
InstructorXL, BGE, GTE, FlagEmbedding
Sentence-BERT / HF embeddings

Knowledge of:

Embedding dimensionality
Tokenization & vector normalization
Multi-embedding pipelines

Chunking & Preprocessing

Strong experience with document processing libraries:

PDFPlumber, PyMuPDF, Textract, Tika

Designing chunking strategies for:

PDFs
Web pages
Product catalogs
Emails & logs

Metadata creation and linking strategies.

Backend / Engineering

Python (preferred), Node.js
FastAPI / Flask
SQL & NoSQL
ETL pipelines (Airflow / custom)
Docker, Linux environments

Experience Required

Total Experience: 26 years
Relevant Vector Search / Embedding Experience: 13 years
Experience in building real RAG systems (highly preferred).

Preferred Skills

Knowledge of:

LangChain or LlamaIndex
Rerankers (Cross-Encoders)
Hybrid retrieval
Graph + Vector hybrid search

Experience in:

OCR processing
Data extraction
Enterprise search systems

Familiarity with:

RedisSearch
ElasticSearch vector search

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: Software Development
Role: Data Engineer
Employement Type: Full time

Contact Details:

Company: Tenth Planet
Location(s): Chennai

+ View Contact

Login

Candidates can login here to view contacts and apply.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach Resume Max 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Candidates are expected to provide most recent and accurate profile information, inappropriate content is strictly prohibited!

Keyskills: embedding Retrieval Augmented Generation Vector

Job seems aged, it may have been expired!
Fraud Alert to job seekers!

₹ Not Disclosed

Job application

We will notify the employer with your details. You can also attach a resume or a cover letter.

Sign In Sign Up

Email:

Password:

Password too short

To create your profile, apply for a job or make a registration

Your name (*)

Email (*)

Mobile (*)

Preferred City (* max. 2 w/comma)

Designation / Expected Role

Current / Recent Company (*)

Experience (*)

Expected Salary (*)

Desired Industry (*):

Functional area / Department (*):

Enter Skills (key skills, subjects, technologies & roles to use in search)

Write briefly about yourself, your experience and education (*)

Attach ResumeMax 2.38 MB (RTF, PDF, DOC, DOCX formats only parsed)

Please, check the file size and type.

Add social media [ + ]

Create password

I agree with website service terms and conditions

Similar positions

Site Reliability Engineer

Capgemini

4 - 8 years

Mumbai

14 hours ago

₹ Not Disclosed

.Net Core - Engineering Manager / Technical Manager - Hyderabad

Wipro

10 - 20 years

Hyderabad

24 hours ago

₹ Not Disclosed

Embedded C++ Application Development

Cognizant

8 - 13 years

Hyderabad

1 day ago

₹ Not Disclosed

Sr. Software Engineer B2

Cognizant

6 - 10 years

Bengaluru

2 days ago

₹ Not Disclosed

Opentext

Open Textâ„¢ is the world's largest independent provider of Enterprise Content Management software. The Company's solutions manage information for all types of business, compliance and industry requirements in the world's largest companies, government agencies and professional service firms. Ope...

Vector Database & Embedding Engineer RAG Pipeline Development @ Opentext

Home > Software Development