Amazon Kendra
- easy-to-use enterprise search service (i.e., powered by ML)
- allows developers to add search capabilities to the applications which helps in faster data discovery within vast amount of data spread across their company.
- some of the datasets may include manuals, reports, FAQ's, human resources or customer service guides
- data could be stored anywhere on S3, Sharepoint, Salesforce, ServiceNow, RDS, Microsoft OneDrive etc.,
- when you type a question - Kendra uses ML algorithms to understand context and return the most relevant results.
Steps to enable Kendra a high-level:
- Create an Amazon Kendra Index
- After index is created, we can explore on using the data source connectors based on where our data resides.
- Then ingest the data based on the connector selected. Based on the ingested data, we need to create a document metadata that helps in faceting and filtering the documents.
- Next, we can generate and ingest a list of FAQ's
- Run search queries and audit the answers we get.
Creating an Amazon Kendra Index:
steps:
1. Access Amazon Kendra Console
2. Create an index
3. Give index name, IAM role and Role name (if creating a new role)
4. For configuring user access control, we can keep default as No.
5. Select of one of the available 2 provisioning editions (Developer, Enterprise)
5. Select of one of the available 2 provisioning editions (Developer, Enterprise)
6. Hit on Create, and wait until the process is completed.
NOTE:
- Kendra automatically will publish error and alert logs to Amazon CloudWatch.
- Kendra automatically will publish error and alert logs to Amazon CloudWatch.
- A CloudWatch log group and corresponding log stream will be created for us.
Ingesting Documents:
We can ingest documents to Kendra using the following mechanisms:
- Data Sources: Location (such as Sharepoint or Salesforce or S3), where we store the documents for indexing. You can automatically synchronize data sources with Kendra index so that new, updated, or deleted documents in the data source are also added, updated or deleted the index for searching on.
- FAQ Documents: That contain questions and answers, which can be uploaded or using CreateFaq API
- Using BatchPutDocument API: that can take inline blobs and s3 locations for documents
- Create custom data source if needed, using the same BatchPutDocument API
Unstructured text that can be ingested via connectors or the BatchPut interface:
- HTML files
- Microsoft PowerPoint presentations
- Microsoft Word documents
- Plain text documents
- PDF's
Amazon Kendra S3 Connector
Kendra offers s3 connector that allows document ingestion.
Advantage of using the provided connector is that it has the ability to ingest the associated metadata attributes associated with the original document.
Steps:
- Create s3 bucket to store your documents
- Upload the required documents to the s3 bucket
- Go to Data Management -> Data Sources -> Select sample dataset (Amazon s3). Select Amazon S3 (Add connector)
- For the s3 data source: configure sync settings: enter the data source location, metadata files prefix optional, ACL configuration file optional
- On the additional configuration -- select you can define inclusion and exclusion patterns, add the s3 folder and click Add
- Set sync run schedule -> select run on demand, click on Next
- On the set field mappings, keep default configuration, click on Next
- On the review and create page, click on add data source to complete the process of adding s3 as a data source
- After creation process is complete, click on Sync now.
- Time to test the query by going to Data Management -> Search indexed content
- Type a query in the search bar to search for specific content
Filtering search results (Metadata documents)
- We can filter search results based on the Category field, using a category
- Can select "Security" as category to filter the results
- Search results can be improved by creating a separate metadata document
Adding fields to your Kendra index
- Click on Facet Definitions
- Click on Add field
- Enter the field name (the same name should be used as it appears in the metadata document) and the select datatype and click Add
- Save the added fields
Updating the s3 connector
- As we make changes to the fields or adding the metadata files, we can update by running the sync now job again to update the index with new files.
Filtering Queries in Amazon Kendra
Under Data Management -->Select facet definition
For every column, we do have an option of selecting one of the 4 options: Facetable, Searchable, Displayable, Sortable
Click on Search indexed content and perform a search.
Using facets in a query
- Once selected the facetable on the columns
- This will add a new key in the response called "FacetResults" that contains the facet values for the documents in the response
Making an index field sortable
- Back in the index facet definition section, unmark Sortable from all the fields
- Run a query, you will notice that only option for sorting is "Relevance"
- Back on the facets definition, mark the fields as sortable.
- Run the query again (which uses the new field added as the sorting parameter for the query)
- For example, use code to run a query and sort the results by the new attribute in ascending order.
Relevance Tuning
- Allows you to give a boost to a result in the response when the query includes terms that match the attribute
- In order to allow the attribute to be used to boost a document you need to mark it as searchable
Comments
Post a Comment