Semantic Kernel 学习笔记:开始体会用 Semantic Memory 生成 Embedding 并进行语义查找
Semantic Kernel 的 Memory 有两种完结,一个是 Semantic Kernel 内置的 Semantic Memory,一个是独立的 Kernel Memory,Kernel Memory 是从 Semantic Kernel 进化而来。
关于 Semantic Memory 的介绍(来历):
Semantic Memory (SM) is a library for C#, Python, and Java that wraps direct calls to databases and supports vector search. It was developed as part of the Semantic Kernel (SK) project and serves as the first public iteration of long-term memory. The core library is maintained in three languages, while the list of supported storage engines (known as "connectors") varies across languages.
学习方针:经过 Semantic Memory 调用 OpenAI 的 api,运用 text-embedding-ada-002 模型生成文本的 embedding,保存在 in-memory 向量数据库中,然后进行语义查找。
学习材料:Semantic Kernel 源码库房中的示例程序 Example14_SemanticMemory.cs
创立 .NET 控制台项目
dotnet new console
dotnet add package Microsoft.SemanticKernel
dotnet add package --prerelease Microsoft.SemanticKernel.Plugins.Memory
创立 ISemanticTextMemory 实例
运用 MemoryBuilder
根据 OpenAITextEmbeddingGenerationService
创立 ISemanticTextMemory
的实例 SemanticTextMemory
#pragma warning disable SKEXP0011
#pragma warning disable SKEXP0003
#pragma warning disable SKEXP0052
ISemanticTextMemory memory = new MemoryBuilder()
.WithOpenAITextEmbeddingGeneration("text-embedding-ada-002", apiKey)
.WithMemoryStore(new VolatileMemoryStore())
.Build();
#pragma warning restore SKEXP0052
#pragma warning restore SKEXP0003
#pragma warning restore SKEXP0011
注:上面代码中的 warning disable
是因为 MemoryBuilder
以及2个扩展办法都是 experimental feature
预备用户生成 Embedding 的文本数据
var sampleData = new Dictionary<string, string>
{
["https://github.com/microsoft/semantic-kernel/blob/main/README.md"]
= "README: Installation, getting started, and how to contribute",
["https://github.com/microsoft/semantic-kernel/blob/main/dotnet/notebooks/02-running-prompts-from-file.ipynb"]
= "Jupyter notebook describing how to pass prompts from a file to a semantic plugin or function"
};
生成 Embedding 并保存至 in-memory 向量数据库
var i = 0;
foreach (var entry in sampleData)
{
await memory.SaveReferenceAsync(
collection: "SKGitHub",
externalSourceName: "GitHub",
externalId: entry.Key,
description: entry.Value,
text: entry.Value);
Console.Write($" #{++i} saved.");
}
在 SaveReferenceAsync
办法中调用了 IEmbeddingGenerationService
的 GenerateEmbeddingAsync
办法生成 embedding,详见 SK 源码 SemanticTextMemory.cs#L60
var embedding = await this._embeddingGenerator.GenerateEmbeddingAsync(text, kernel, cancellationToken).ConfigureAwait(false);
注:embedding
值的类型是 ReadOnlyMemory<float>
咱们这儿用的是 OpenAI,所以调用的是 OpenAITextEmbeddingGenerationService
的 GenerateEmbeddingsAsync
办法生成 embedding(详见SK源码),终究调用的是 Azure.AI.OpenAI.OpenAIClient
的 GetEmbeddingsAsync
办法,详见 Azure SDK for .NET 的源码 OpenAIClient.cs#L552
根据 Embedding 数据进行语义查找
var query = "How do I get started?";
var memoryResults = memory.SearchAsync("SKGitHub", query, limit: 1, minRelevanceScore: 0.5);
在 SearchAsync
办法中也调用了 GenerateEmbeddingsAsync
办法根据查询文本生成 embedding,详见 SemanticTextMemory.cs#L108
输出语义查找的成果
await foreach (var memoryResult in memoryResults)
{
Console.Write($"Result:");
Console.Write(" URL: : " + memoryResult.Metadata.Id);
Console.Write(" Title : " + memoryResult.Metadata.Description);
Console.Write(" Relevance: " + memoryResult.Relevance);
}
运转控制台程序
输出成果:
#1 saved.
#2 saved.
Result:
URL: : https://github.com/microsoft/semantic-kernel/blob/main/README.md
Title : README: Installation, getting started, and how to contribute
Relevance: 0.8224089741706848
查找成功,学习完结,完好示例代码见 https://www.cnblogs.com/dudu/articles/18037216