Introduction
In this blog post, we’ll walk through building a full-stack prototype that combines the power of Retrieval-Augmented Generation (RAG), keyword-level sentiment analysis, and OpenAI summaries using C#, Blazor Server, and Qdrant. We ingest data from both URLs and PDF files, perform analysis, and allow users to query a semantic vector store in real time.
Overview of the Workflow
This system provides:
- Dual ingestion sources: users can upload PDFs or specify URLs.
- Tokenization and embedding of page or chunk-level text.
- Keyword extraction and sentiment analysis, both page-wide and per keyword.
- Vector storage in Qdrant for semantic search.
- Keyword-level sentiment summaries using OpenAI’s Chat API.
- Blazor Server UI with charting for visual results.
Key Technologies Used
- ML.NET: For running a local ONNX model to score sentiment.
- Microsoft.ML.Tokenizers: For BERT-compatible tokenization.
- Qdrant: Lightweight vector store with cosine similarity.
- OpenAI .NET SDK (2.1.0): For keyword sentiment summaries.
- Blazorise + Chart.js: For UI and interactive charting.
- PdfPig: For extracting text from PDFs.
Architecture Breakdown
Services
SentimentAnalyzerService: Uses ONNX + tokenizer to get sentiment logits.KeywordExtractorService: Extracts keyword frequencies from raw text.KeywordContextSentimentService: Runs sentiment on keyword-local context windows.TextChunker: Splits long documents into manageable chunks.EmbeddingService: Uses OpenAI Embeddings API to get vector representation.VectorStoreService: Talks to Qdrant and supports search + upsert.KeywordSentimentSummaryService: Builds averaged keyword sentiment map and uses GPT to summarize it.
Controller Endpoints
api/rag/analyze: For URL ingestion and sentiment processing.api/pdf/analyze: Accepts PDF uploads, runs chunk analysis, sentiment, and stores vectors.api/rag/query: Accepts a question and performs similarity search against stored vectors.
Key Feature: Keyword Sentiment Summary
Once page-level and keyword-level sentiment scores are collected across all documents or PDFs, the system averages the keyword scores and sends them to GPT-4:
var prompt = BuildPromptFromAveragedSentiments(averaged);
var chatClient = _openai.GetChatClient("gpt-4");
var chatMessage = new List<ChatMessage>
{
ChatMessage.CreateSystemMessage("You are an analyst. Provide a short summary of the keyword-level sentiment results."),
ChatMessage.CreateUserMessage(prompt)
};
var response = await chatClient.CompleteChatAsync(chatMessage);
This gives a human-readable synthesis of what the model thinks about your documents—great for reporting or analysis.
UI Components
RAGAnalyzer.razor: Accepts URLs, performs analysis, and shows results.UploadPdf.razor: Accepts PDFs, runs the same pipeline, and stores results.- Charts:
PageSentimentChartKeywordSentimentChartKeywordChart
- Query Component: Lets you search the vector DB by embedding your query and showing top chunks.
Example Use Case
Imagine you’re analyzing press releases, company filings, or product reviews. Paste in a few URLs or upload PDFs of documents, provide key terms like “revenue”, “climate”, or “safety”, and let the app:
- Analyze how each keyword is viewed in each source.
- Store those insights as vector chunks.
- Summarize keyword sentiment.
- Allow query access using natural language.
Sample Output:
Keyword Sentiment Summary:
- "profit": moderately positive
- "layoffs": highly negative
- "growth": slightly positive
Summary: Overall, the documents emphasize profitability with moderate optimism. Layoffs were discussed negatively, while growth projections are cautiously positive.
Lessons Learned
- ONNX with ML.NET is a great choice for local sentiment without sending data to the cloud.
- Qdrant is incredibly lightweight and fast for semantic similarity.
- OpenAI summaries give clarity to otherwise raw logits and scores.
- Blazor Server and Blazorise provide a robust UI pattern that feels modern and reactive.
Final Thoughts
This prototype illustrates the full power of combining local ML (via ONNX), semantic embeddings (via OpenAI), vector databases (via Qdrant), and interactive UI (via Blazor). It’s designed to be extensible, SOLID, and production-quality.
With this setup, you can:
- Scale to PDFs, HTML pages, or any document corpus.
- Add per-document classifiers, labels, or redaction.
- Evolve into a complete enterprise knowledge extraction tool.
Let me know if you’d like to extend this to:
- LLM completion directly over chunks
- In-browser text highlighting
- Source citation in query answers
- Scheduled crawl + ingestion of new data
Happy hacking!

