PRUNEVISION | API Documentation

Core Concepts

PRUNEVISION exposes analytical algorithms to determine which "tokens" (parts of an image or embedding) are important.

Static

Uses L2 Norm (Magnitude). Keeps tokens with the highest energy activation.

Entropy

Measures information density within the token vector via Softmax/Log distribution.

Fractal

Computes the Box-Counting Dimension (complexity) of the signal structure.

Neighborhood

Evaluates token centrality and local variance relative to other tokens.

1. Tensor Processing

Process raw embedding tensors efficiently. Preferred for backend-to-backend communication.

POST /prune/embeddings-binary

High-performance endpoint accepting raw bytes (numpy/torch buffer). Avoids JSON parsing overhead.

Query Parameters

Name	Type	Description
method	string	`entropy` (default), `static`, `fractal`, `neighborhood`
prune_ratio	float	0.0 to 1.0 (e.g., 0.5 removes 50% of tokens)
shape	string	Input shape "B,N,D" (e.g., "1,196,768")
return_binary	bool	If true, returns raw bytes of pruned tensor.

Request Body

Multipart/Form-Data: File upload containing raw float32 bytes.

Rate Limit: 20/min

Python Example

import requests
import numpy as np

# Generate dummy data: Batch=1, Tokens=196, Dim=768
data = np.random.rand(1, 196, 768).astype(np.float32)

response = requests.post(
    "http://localhost:8000/prune/embeddings-binary",
    params={"shape": "1,196,768", "method": "fractal", "prune_ratio": 0.3},
    files={"file": data.tobytes()}
)
print(response.json())

2. Vision & Visualization

Directly upload images to analyze pruning behavior on visual patches.

POST /prune/visualize-reconstruction

Returns a PNG image showing which parts of the image were kept (color) and removed (black).

file	Multipart image file (JPEG/PNG)
method	Pruning strategy to apply.
prune_ratio	Percentage of patches to mask out (e.g., 0.7).

POST /prune/image

Converts image to tokens internally, prunes, and returns FLOPs/Reduction metrics.

Response JSON Example:

{
  "filename": "cat.jpg",
  "method": "entropy",
  "metrics": {
    "original_tokens": 196,
    "remaining_tokens": 98,
    "token_reduction_ratio": 0.5,
    "flops_reduction_ratio": 0.5012
  }
}

3. Benchmarking & IoT

Compare All

/benchmark/compare-all

Upload a binary tensor once. The server runs all 4 algorithms and returns a JSON comparison list sorted by signal preservation score.

Ideal for Research Papers

IoT Transmission

/optimize/transmission

Compresses an image into a custom .spv (Sparse Vision) format containing only high-entropy patches.

Returns X-Savings-Percent header.
Uses Zlib compression on sparse data.

PRUNEVISION API Docs