Production-Ready Text Embeddings & Reranking API by Remodl AI
LexIQ Vectors is a high-performance text embedding and reranking service developed by Remodl AI, Inc. It provides state-of-the-art multilingual text embeddings and document reranking through an OpenAI-compatible API, powered by Qwen3 models optimized for search and retrieval tasks.
- 🚀 Ultra-Fast Performance: Sub-200ms response times when warm
- 🌐 100+ Languages: Full multilingual support out of the box
- 🔍 Instruction-Aware: Boost search quality by 1-5% with custom instructions
- 🔄 OpenAI Compatible: Drop-in replacement for OpenAI embeddings
- 📊 Advanced Reranking: Optimize search results with state-of-the-art reranking
- 🏗️ Enterprise Ready: Built on RunPod serverless infrastructure for scale
- Models: Qwen3-Embedding-0.6B & Qwen3-Reranker-0.6B
- Infrastructure: RunPod Serverless GPU Workers
- Engine: Infinity (high-throughput inference)
- API: OpenAI-compatible REST endpoints
- Container: Docker with CUDA 12.4.1 support
https://api.lexiq.dev
POST /openai/v1/models- List available modelsPOST /openai/v1/embeddings- Generate embeddingsPOST /openai/v1/rerank- Rerank documents
from openai import OpenAI
# Initialize client
client = OpenAI(
api_key="your-lexiq-api-key",
base_url="https://api.lexiq.dev/openai"
)
# Generate embeddings
response = client.embeddings.create(
model="models/Qwen3-Embedding-0.6B",
input="What is machine learning?",
extra_body={
"prompt_type": "query" # Optimize for search queries
}
)
embedding = response.data[0].embeddingimport requests
# Generate embeddings
response = requests.post(
"https://api.lexiq.dev/openai/v1/embeddings",
headers={
"Authorization": "Bearer your-lexiq-api-key",
"Content-Type": "application/json"
},
json={
"model": "models/Qwen3-Embedding-0.6B",
"input": "What is machine learning?",
"extra_body": {
"prompt_type": "query"
}
}
)
embedding = response.json()["data"][0]["embedding"]Improve search quality by using different instructions for queries vs documents:
# Using built-in optimization
embedding = client.embeddings.create(
model="models/Qwen3-Embedding-0.6B",
input="How to implement neural networks?",
extra_body={
"prompt_type": "query"
}
)
# Or with custom instruction
embedding = client.embeddings.create(
model="models/Qwen3-Embedding-0.6B",
input="How to implement neural networks?",
extra_body={
"instruction": "Represent this programming question for finding code examples"
}
)# Documents typically don't need instructions
embedding = client.embeddings.create(
model="models/Qwen3-Embedding-0.6B",
input="Neural networks are a fundamental component of deep learning..."
)Optimize search results by reranking documents based on relevance:
# Rerank documents with default instruction
response = requests.post(
"https://api.lexiq.dev/openai/v1/rerank",
headers={
"Authorization": "Bearer your-lexiq-api-key",
"Content-Type": "application/json"
},
json={
"query": "What product has the best warranty?",
"documents": [
"Product A: 2-year comprehensive warranty",
"Product B: Lifetime limited warranty",
"Product C: 90-day warranty",
"Product D: 5-year extended warranty available"
],
"return_documents": True,
"top_k": 3
}
)
# Or with custom instruction
response = requests.post(
"https://api.lexiq.dev/openai/v1/rerank",
headers={
"Authorization": "Bearer your-lexiq-api-key",
"Content-Type": "application/json"
},
json={
"query": "What product has the best warranty?",
"documents": [...],
"extra_body": {
"instruction": "Find products with the longest warranty duration and best coverage terms"
},
"return_documents": True,
"top_k": 3
}
)
# Results are sorted by relevance score
for result in response.json()["results"]:
print(f"Score: {result['score']:.3f} - {result['document']}")- Cold Start: 10-15 seconds (model loading)
- Warm Requests: 100-200ms (embeddings), 130-180ms (reranking)
- Throughput: Up to 300 concurrent requests
- Embedding Dimensions: 1024
- Max Sequence Length: 32,768 tokens
- Batch Processing: Optimized for batches up to 32
- Search Queries: Always use
prompt_type: "query"or custom search instructions - Documents: Use raw text without instructions for best results
- Domain-Specific: Create custom instructions for specialized use cases
# Process multiple texts in one request
embeddings = client.embeddings.create(
model="models/Qwen3-Embedding-0.6B",
input=[
"First document",
"Second document",
"Third document"
]
)# Step 1: Embed query
query_embedding = client.embeddings.create(
model="models/Qwen3-Embedding-0.6B",
input="user search query",
extra_body={"prompt_type": "query"}
)
# Step 2: Vector search in your database
candidates = vector_db.search(query_embedding, top_k=100)
# Step 3: Rerank top candidates
reranked = requests.post(
"https://api.lexiq.dev/openai/v1/rerank",
headers={"Authorization": "Bearer your-key"},
json={
"query": "user search query",
"documents": candidates,
"top_k": 10
}
)extra_body={
"instruction": "Represent this query for finding academic research papers"
}extra_body={
"instruction": "Represent this query for finding code implementations and examples"
}extra_body={
"instruction": "Represent this query for e-commerce product search"
}extra_body={
"instruction": "Represent this question for finding answers in documentation"
}All errors follow the OpenAI error format:
{
"error": {
"message": "Model not found",
"type": "invalid_request_error",
"code": "model_not_found"
}
}Common error types:
invalid_request_error- Invalid parametersauthentication_error- Invalid API keyrate_limit_error- Too many requestsinternal_error- Server error
- Documentation: https://docs.lexiq.dev
- API Status: https://status.lexiq.dev
- Support: support@remodlai.com
Remodl AI, Inc. is dedicated to building production-ready AI infrastructure that scales. LexIQ Vectors is part of the LexIQ platform, providing enterprise-grade AI capabilities for modern applications.
Built with ❤️ by Remodl AI, Inc.