PRUNEVISION API Docs

Analytical benchmarking for Vision Transformer (ViT) pruning.
Estimate FLOPs, visualize token reduction, and optimize transmission without training.

Base URL: / Fast & Stateless Python/Torch

Core Concepts

PRUNEVISION exposes analytical algorithms to determine which "tokens" (parts of an image or embedding) are important.

Static

Uses L2 Norm (Magnitude). Keeps tokens with the highest energy activation.

Entropy

Measures information density within the token vector via Softmax/Log distribution.

Fractal

Computes the Box-Counting Dimension (complexity) of the signal structure.

Neighborhood

Evaluates token centrality and local variance relative to other tokens.

Your feedback is essential!

Since we are just starting out, your feedback is the most important tool for PRUNEVISION to evolve.

1. Tensor Processing

Process raw embedding tensors efficiently. Preferred for backend-to-backend communication.

POST /prune/embeddings-binary

High-performance endpoint accepting raw bytes (numpy/torch buffer). Avoids JSON parsing overhead.

Query Parameters
NameTypeDescription
methodstringentropy (default), static, fractal, neighborhood
prune_ratiofloat0.0 to 1.0 (e.g., 0.5 removes 50% of tokens)
shapestringInput shape "B,N,D" (e.g., "1,196,768")
return_binaryboolIf true, returns raw bytes of pruned tensor.
Request Body

Multipart/Form-Data: File upload containing raw float32 bytes.

Rate Limit: 20/min
Python Example
import requests
import numpy as np

# Generate dummy data: Batch=1, Tokens=196, Dim=768
data = np.random.rand(1, 196, 768).astype(np.float32)

response = requests.post(
    "http://localhost:8000/prune/embeddings-binary",
    params={"shape": "1,196,768", "method": "fractal", "prune_ratio": 0.3},
    files={"file": data.tobytes()}
)
print(response.json())

2. Vision & Visualization

Directly upload images to analyze pruning behavior on visual patches.

POST /prune/visualize-reconstruction

Returns a PNG image showing which parts of the image were kept (color) and removed (black).

file Multipart image file (JPEG/PNG)
method Pruning strategy to apply.
prune_ratio Percentage of patches to mask out (e.g., 0.7).

POST /prune/image

Converts image to tokens internally, prunes, and returns FLOPs/Reduction metrics.

Response JSON Example:
{
  "filename": "cat.jpg",
  "method": "entropy",
  "metrics": {
    "original_tokens": 196,
    "remaining_tokens": 98,
    "token_reduction_ratio": 0.5,
    "flops_reduction_ratio": 0.5012
  }
}

3. Benchmarking & IoT

Compare All

/benchmark/compare-all

Upload a binary tensor once. The server runs all 4 algorithms and returns a JSON comparison list sorted by signal preservation score.

Ideal for Research Papers

IoT Transmission

/optimize/transmission

Compresses an image into a custom .spv (Sparse Vision) format containing only high-entropy patches.

  • Returns X-Savings-Percent header.
  • Uses Zlib compression on sparse data.