BiCLIP
Domain Canonicalization via Structured Geometric Transformation.
BiCLIP addresses the “modality gap” in Vision-Language Models like CLIP and SigLIP. By introducing a structured, bilinear transformation matrix, we achieve state-of-the-art domain adaptation with extreme parameter efficiency.
BiCLIP realigns visual features to the textual manifold.
Key Highlights
- SOTA Performance: +15.2% average improvement over zero-shot CLIP across 11 benchmarks.
- Extreme Gains: Up to +42% improvement on specialized domains like EuroSAT.
- Geometric Insight: Validates that domain shift can be recovered via canonical transformations.
References
2026
- BiCLIP: Domain Canonicalization via Structured Geometric TransformationarXiv preprint arXiv:2603.08942, 2026