Amazon Kendra Deep Dive

Amazon Kendra

easy-to-use enterprise search service (i.e., powered by ML)
allows developers to add search capabilities to the applications which helps in faster data discovery within vast amount of data spread across their company.
some of the datasets may include manuals, reports, FAQ's, human resources or customer service guides
data could be stored anywhere on S3, Sharepoint, Salesforce, ServiceNow, RDS, Microsoft OneDrive etc.,
when you type a question - Kendra uses ML algorithms to understand context and return the most relevant results.

Steps to enable Kendra a high-level:

Create an Amazon Kendra Index
After index is created, we can explore on using the data source connectors based on where our data resides.
Then ingest the data based on the connector selected. Based on the ingested data, we need to create a document metadata that helps in faceting and filtering the documents.
Next, we can generate and ingest a list of FAQ's
Run search queries and audit the answers we get.

Creating an Amazon Kendra Index:

steps:

1. Access Amazon Kendra Console

2. Create an index

3. Give index name, IAM role and Role name (if creating a new role)

4. For configuring user access control, we can keep default as No.
5. Select of one of the available 2 provisioning editions (Developer, Enterprise)

6. Hit on Create, and wait until the process is completed.

NOTE:
- Kendra automatically will publish error and alert logs to Amazon CloudWatch.

- A CloudWatch log group and corresponding log stream will be created for us.

Ingesting Documents:

We can ingest documents to Kendra using the following mechanisms:

Data Sources: Location (such as Sharepoint or Salesforce or S3), where we store the documents for indexing. You can automatically synchronize data sources with Kendra index so that new, updated, or deleted documents in the data source are also added, updated or deleted the index for searching on.
FAQ Documents: That contain questions and answers, which can be uploaded or using CreateFaq API
Using BatchPutDocument API: that can take inline blobs and s3 locations for documents
Create custom data source if needed, using the same BatchPutDocument API

Unstructured text that can be ingested via connectors or the BatchPut interface:

HTML files
Microsoft PowerPoint presentations
Microsoft Word documents
Plain text documents
PDF's

Amazon Kendra S3 Connector

Kendra offers s3 connector that allows document ingestion.

Advantage of using the provided connector is that it has the ability to ingest the associated metadata attributes associated with the original document.

Steps:

Create s3 bucket to store your documents
Upload the required documents to the s3 bucket
Go to Data Management -> Data Sources -> Select sample dataset (Amazon s3). Select Amazon S3 (Add connector)
For the s3 data source: configure sync settings: enter the data source location, metadata files prefix optional, ACL configuration file optional
On the additional configuration -- select you can define inclusion and exclusion patterns, add the s3 folder and click Add
Set sync run schedule -> select run on demand, click on Next
On the set field mappings, keep default configuration, click on Next
On the review and create page, click on add data source to complete the process of adding s3 as a data source
After creation process is complete, click on Sync now.
Time to test the query by going to Data Management -> Search indexed content
Type a query in the search bar to search for specific content

Filtering search results (Metadata documents)

We can filter search results based on the Category field, using a category
Can select "Security" as category to filter the results
Search results can be improved by creating a separate metadata document

Adding fields to your Kendra index

Click on Facet Definitions
Click on Add field
Enter the field name (the same name should be used as it appears in the metadata document) and the select datatype and click Add
Save the added fields

Updating the s3 connector

- As we make changes to the fields or adding the metadata files, we can update by running the sync now job again to update the index with new files.

Filtering Queries in Amazon Kendra

Under Data Management -->Select facet definition

For every column, we do have an option of selecting one of the 4 options: Facetable, Searchable, Displayable, Sortable

Click on Search indexed content and perform a search.

Using facets in a query

Once selected the facetable on the columns
This will add a new key in the response called "FacetResults" that contains the facet values for the documents in the response

Making an index field sortable

Back in the index facet definition section, unmark Sortable from all the fields
Run a query, you will notice that only option for sorting is "Relevance"
Back on the facets definition, mark the fields as sortable.
Run the query again (which uses the new field added as the sorting parameter for the query)
For example, use code to run a query and sort the results by the new attribute in ascending order.

Relevance Tuning

Allows you to give a boost to a result in the response when the query includes terms that match the attribute
In order to allow the attribute to be used to boost a document you need to mark it as searchable

soletechie

Search This Blog

Amazon Kendra Deep Dive

Amazon Kendra

Steps to enable Kendra a high-level:

Creating an Amazon Kendra Index:

Ingesting Documents:

Amazon Kendra S3 Connector

Filtering search results (Metadata documents)

Adding fields to your Kendra index

Updating the s3 connector

Filtering Queries in Amazon Kendra

Using facets in a query

Making an index field sortable

Relevance Tuning

Labels

Comments

Post a Comment

Popular posts from this blog

AWS Connect: Reporting and Visualizations

Must use VS Code Extensions for anyone working on Cloud

SoleTechie: Setting up Gitlab