Amazon Kendra Deep Dive

Amazon Kendra is an ML-powered enterprise search service that lets developers add intelligent search to their applications. Unlike traditional keyword search, Kendra understands the context and intent behind a query and returns the most relevant document — even if the exact words don't match.

What problems does Kendra solve?

Most enterprises have data scattered across dozens of systems — S3, SharePoint, Salesforce, ServiceNow, RDS, OneDrive, and more. Kendra provides a unified search layer over all of it without requiring you to move or copy the data.

The key differentiator is the ML backbone. When a user types a natural language question, Kendra doesn't just scan for keyword hits — it uses reading comprehension models to understand the question and surface the most relevant passage or document.

High-level architecture

Setting up Kendra follows a clear pattern:

Create an Index — the central container that holds all indexed content and powers search.
Connect data sources — use native connectors to ingest from S3, SharePoint, Salesforce, ServiceNow, RDS, OneDrive, and more.
Ingest and sync — data is crawled and indexed on a schedule or on-demand.
Define document metadata — metadata fields enable faceting, filtering, and relevance tuning.
Ingest FAQs — upload question/answer pairs that Kendra can surface directly in results.
Run queries and audit — use the console or API to test and tune result quality.

Creating the Index

# Via AWS CLI
aws kendra create-index \
  --name "enterprise-search" \
  --role-arn "arn:aws:iam::123456789012:role/KendraRole" \
  --edition "ENTERPRISE_EDITION"

The index is the most important resource — it controls the edition (Developer vs Enterprise), the IAM role for accessing data sources, and the encryption settings. Enterprise edition is required for production workloads; Developer is fine for testing.

Data source connectors

Kendra supports 40+ native connectors. The most commonly used in enterprise settings are S3, SharePoint Online, Salesforce, and ServiceNow. Each connector requires:

A Secrets Manager secret containing auth credentials
IAM permissions for Kendra to read from the source
A sync schedule (hourly, daily, or on-demand)

# S3 data source example (Terraform)
resource "aws_kendra_data_source" "docs" {
  index_id = aws_kendra_index.main.id
  name     = "s3-documents"
  type     = "S3"

  configuration {
    s3_configuration {
      bucket_name = "my-enterprise-docs"
      inclusion_prefixes = ["manuals/", "policies/"]
    }
  }

  schedule = "cron(0 12 * * ? *)"  # daily at noon
}

Document metadata for faceting

Metadata is what separates a good search experience from a great one. For each data source, you can define custom attributes (like department, document type, or last-modified-by) that users can filter on in the search UI.

Metadata is supplied either via a companion JSON file (for S3) or extracted from document properties (for SharePoint, Salesforce, etc.). Once indexed, these fields appear as facets in the search results API response.

FAQ ingestion

FAQs are a powerful feature that lets you pre-populate answers to common questions. Upload a CSV or JSON file containing question/answer pairs, and Kendra will surface these as direct answers at the top of results when a user's query closely matches.

# FAQ CSV format
_question,_answer
"How do I reset my password?","Visit the IT portal at https://it.company.com/reset"
"What is the vacation policy?","Employees accrue 15 days per year. See HR handbook section 4.2."

Query tuning

After ingestion, the tuning loop is straightforward: run sample queries against real data, review relevance feedback in the console, and adjust field weightings or blocklists as needed. Kendra also supports custom document relevance tuning at the query level if you need to boost specific sources dynamically.

Cost considerations

Kendra is not cheap. Enterprise edition runs ~$1,000/month for the index plus query charges. For most production use cases, it's worth it — but do your volume math before committing. The Developer edition (~$810/month) is a reasonable proxy for testing cost structure before upgrading.

Wrapping up

Kendra is genuinely impressive for enterprise search at scale. The connectors reduce integration work substantially, and the ML relevance is meaningfully better than Elasticsearch out of the box. The main watch-outs are cost and the fact that the native UI is minimal — you'll need to build your own search interface on top of the API for anything beyond a demo.

Next post in this series: hooking Kendra up to a React frontend using the Query API and building a clean search result renderer.