FractalWaves
FractalWaves

Dimensional Directory

A flexible and extensible unit-based data management system that redefines how information is stored, organized, and retrieved.

System Overview

The Dimensional Directory (DD) is a cornerstone of FractalWaves' computational infrastructure, providing a fundamentally new approach to data management. By treating information units as geometric entities with precise coordinates in an information space, the DD creates a framework that transcends traditional database limitations. At its core, the system uses a table-based indexing approach that maps zero-indexed addresses to UUIDs, which function as pointers to the exact location of data in HDF5 storage.

Core Innovation

The DD combines zero-indexed addressing with UUID-based deduplication, creating an information system that maintains both human-readable locations and perfect content deduplication - a duality that mirrors C-Space theory's fundamental principles.

Practical Impact

This architecture enables unprecedented efficiency in storing, retrieving, and relating information units across scales, from tokens to documents to entire knowledge domains, while preserving perfect referential integrity.

Key Features

UUID-Based Deduplication with Zero-Indexed Access

Identical content is stored once in HDF5 using UUIDs, but can be referenced through human-readable zero-indexed addresses via a table-based indexing system.

Table-Based Indexing System

Maps between zero-indexed addresses and UUIDs to efficiently locate data in HDF5 storage, providing a layer of indirection between user-facing addresses and physical storage.

Dual-Database Architecture

Combines SQLite for structured metadata with HDF5 for raw data and embeddings, optimizing for both relational integrity and high-dimensional vector operations.

Extensible Type System

Type-specific managers provide specialized operations for different content types, enabling domain-specific optimization within a unified framework.

Relation Management

Create and manage typed relationships between content units, enabling complex knowledge graphs with metadata-rich connections.

Embedding Support

First-class support for vector embeddings enables semantic operations across the information space, bridging symbolic and geometric representations.

System Architecture

EnhancedDDService
DataPrestructurer
TypeManagers
SQLite (Metadata)
HDF5 (Raw Data)

The Dimensional Directory architecture integrates multiple specialized components that work together to provide a unified data management experience across different information types and scales.

Zero-Indexed Addressing and UUID Mapping

The DD uses a hierarchical addressing scheme with a table-based mapping system that serves as a layer of indirection between human-readable addresses and the actual data storage:

Address Format & Mapping

doc:123

↪ Maps to a UUID in the indexing table

doc:123-0

↪ Maps to a specific UUID in HDF5

doc:123-0.0

↪ Maps to a specific UUID in HDF5

System Benefits

  • Human-readable addressing for users
  • Direct access to HDF5 data via UUID pointers
  • Efficient storage through deduplication
  • Fast retrieval with optimized indexing

How It Works

The zero-indexed addresses are mapped to UUIDs through an indexing table, and these UUIDs serve as pointers to the exact locations of data in the HDF5 database. This approach creates an abstraction layer over the physical storage, allowing for both human-friendly addressing and highly efficient data retrieval.

Applications & Benefits

Practical Applications

  • Knowledge Management: Unified storage for documents, data, and their interrelationships
  • Data Integration: Seamless bridging between structured and unstructured information
  • Content Generation: Foundation for coherent multi-scale content creation
  • Semantic Search: Unified symbolic and vector-based retrieval of information

Key Benefits

  • Performance: Direct access to HDF5 data locations through UUID pointers
  • Scalability: Indexing system scales efficiently with growing data volumes
  • Flexible Addressing: Human-readable navigation through complex information spaces
  • Extensible Framework: Adaptable to domain-specific information needs

Implementation Example

Basic Usage

from dimensional_directory.service.dd_service import EnhancedDDService

# Initialize the service
service = EnhancedDDService(base_path="./dd_data")

# Process a document - returns UUID information
result = service.process_input(
    data="This is a sample document. It contains multiple sentences.",
    input_type="document",
    context_id="doc-001"
)

# Get UUID for a zero-indexed address
uuid = service.get_uuid_for_address("doc-001-0")
print(uuid)  # "550e8400-e29b-41d4-a716-446655440000"

# Get content using the zero-indexed address
# (internally translates to UUID for HDF5 access)
unit = service.get_unit("doc-001-0", resolve_type="address")
print(unit['content'])  # "This is a sample document."

# Get content using UUID directly
unit = service.get_unit(uuid, resolve_type="uuid")
print(unit['content'])  # "This is a sample document."

Advanced HDF5 Access

# Get raw HDF5 data for a specific unit
hdf5_data = service.get_hdf5_data("doc-001-0", data_type="raw")

# Get structured HDF5 data
structured_data = service.get_hdf5_data("doc-001", data_type="structured")

# Get embedding from HDF5
embedding = service.get_hdf5_data("doc-001-0", data_type="embedding")

# Working with embeddings
import numpy as np

# Add an embedding for a unit (maps to UUID internally)
embedding = np.random.rand(768).tolist()  # Example vector
service.add_embedding("doc-001-0", embedding, unit_type="address")

# Semantic search using embeddings
similar_units = service.similarity_search(
    embedding, 
    top_k=5, 
    threshold=0.7
)