MarketForge • Proprietary

AI Knowledge Slot Curation

Transform unstructured complexity into highly reliable, semantically matchable data structures.

Make The Unmatchable, Matchable

Thin markets fail primarily due to information gaps. Buyers and sellers express their needs in drastically different semantics. This is where DeeperPoint's semantic Knowledge Slots play a critical role, defining the exact units of information required for a deal to occur.

As the proprietary data-refinement wing of the MarketForge ensemble, the AI Knowledge Slot Curation tool digests messy documents (PDFs, spreadsheets, technical manuals, unstructured text) and structurally aligns them to fit directly into your ecosystem's Knowledge Slots using powerful Large Language Models.

How It Integrates

✅ Extraction: Pulls strict entities, specs, and criteria out of unstructured noise.
✅ Normalization: Translates diverse expressions into a universally understood schema.
✅ Feeding Cosolvent: Curated slots are seamlessly pipelined to the Cosolvent matching engine.

View Feature Sheet ↓ ⬇ Roadmap (PDF) ← Back to Ecosystem

Feature Sheet

KnowledgeSlot Feature Overview

✅ Implemented 🔜 Planned

Document Ingestion

✅	Multi-Format Parsing	PDF (pymupdf4llm), DOCX, PPTX, Markdown, and plain text ingestion with metadata extraction.
✅	Semantic Chunking	LLM-driven segmentation into self-contained knowledge units with titles and summaries.
✅	Vector Embedding	OpenAI text-embedding-3-small with pgvector storage and cosine-distance search.
✅	Duplicate Detection	SHA-256 content hashing with similarity thresholds prevents redundant ingestion.
🔜	URL Scraping	Ingest directly from web pages, government databases, and industry portals.

Schema Intelligence

✅	Vertical-Specific Metadata	Tag documents with vertical, region, and topic for filtered retrieval.
✅	Topic Taxonomy	Hierarchical topic trees for organized domain knowledge browsing.
🔜	Authority Grading	Rank sources by reliability — peer-reviewed journals, government data, vs. general web.
🔜	Schema Auto-Discovery	Analyze ingested documents to suggest new metadata fields and taxonomies.

Retrieval & Integration

✅	Hybrid Search	Combined vector similarity + keyword search with metadata filters.
✅	Domain Q&A	RAG-powered question answering grounded in curated reference library.
✅	Cosolvent Integration	Feed curated knowledge directly into Cosolvent's matching and Content Match Story pipelines.
🔜	Cross-Vertical Linking	Discover connections between knowledge in different verticals for multi-market insights.

Curation Workflow

✅	Sponsor Dashboard	Web UI for browsing, searching, and managing the reference library.
✅	Chunk Review Interface	Inspect individual chunks, edit metadata, and verify extracted knowledge units.
🔜	Staleness Detection	Flag documents past their review date or with superseded source data.
🔜	Curatorial Pull Signal	Identify knowledge gaps from failed matches and prompt sponsors to add missing references.

Provenance & Trust

✅	Source Tracking	Every chunk traces back to its source document, page, and upload context.
✅	Content Hashing	SHA-256 hashing prevents re-ingestion of identical content.
🔜	Citation Generation	Auto-generate citations when knowledge is used in Content Match Stories.

Architecture

Language	Python 3.11+
Framework	FastAPI + Jinja2
Database	PostgreSQL + pgvector
AI Providers	OpenAI (embedding + chunking)
Integration	Native Cosolvent pipeline feed