Superpixel Feature Extractor
Generate features for superpixels and patches using pretrained models. Features are saved as .npz
files with keys for the features ('feats'
), superpixel bounding box ('bbox'
), and region adjacency edges ('rag'
).
In terms of feature space, the code supports ResNet, CLIP, BLIP, and SigLIP features. Currently, only SLIC and Watershed superpixel segmentation algorithms are implemented (via scikit-image) however, patching is also supported (but not with the --rag
flag).
To generate a collection of superpixels for the COCO dataset, see runner.sh
for an example of how this can be achieved.
For compatiblity with the Karpathy Split of the COCO dataset, merge_and_clean.py
is provided. This script will move and rename the superpixel feature files such that they can be used in place of the BUTD files in the original Karpathy Split JSON files.
Dependencies
conda env create -f environment.yml
conda create --name sp python=3.9
conda activate sp
python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
python3 -m pip install torch_geometric
python3 -m pip install git+https://github.com/openai/CLIP.git
python3 -m pip install salesforce-lavis
python3 -m pip install -U timm==1.0.14
python3 -m pip install -U transformers==4.48.1
Parameters
Name | Description |
---|---|
--image_dir | The directory containnig image inputs |
--save_dir | The directory to save the npz files to |
--feature_extractor | Which model to use? [BLIP / CLIP / RESNET / SIGLIP / VIT] |
--num_segments | The number of superpixels to generate per image (Not compatible with --whole_img ) |
--segmenter | Which superpixel algorithm to use? [SLIC / WATERSHED] |
--whole_img | (Flag) Generate a single feature for the whole image (Not compatible with --rag ) |
--patches | (Flag) Generate patch features instead of superpixel features (Not compatible with --rag ) |
--rag | (Flag) Generate the Region Adjacency Graph edges between superpixels |
⚠️ The
--segmenter PATCHES
flag will try to generate the number of segments give in--num-segments
. However, this number must result in the patching kernel size being a whole number. i.e. must be an integer (where is--num_segments
). See patcher.py for implementation details.
Feature Sizes
Model | Feature Dim |
---|---|
ResNet | 2048 |
CLIP | 512 |
BLIP | 768 |
SigLIP | 1152 |
ViT (vit_b_16 ) | 1000 |
Examples
Generate 25 Watershed superpixel CLIP features for the Karpathy Test Set with RAG edges
python3 main.py --image_dir "/home/hsenior/coco/img/test2014/" \
--save_dir "/home/hsenior/coco/superpixel_features/" \
--model_id "CLIP" \
--num_segments 25 \
--segmenter "WATERSHED" \
--rag
Generate whole image ResNet features for the Karpathy Validation set
python3 main.py --image_dir "/home/hsenior/coco/img/val2014/" \
--save_dir "/home/hsenior/coco/superpixel_features/" \
--model_id "RESNET" \
--whole_img
Generate 196 patch BLIP features for the Karpathy Train Set
python3 main.py --image_dir "/home/hsenior/coco/img/train2014/" \
--save_dir "/home/hsenior/coco/superpixel_features/" \
--model_id "BLIP" \
--num_segments 196 \
--segmenter "PATCHER" \