Ctrl K

Computer Vision

39 tools for computer vision.

Backbones & classification

  • AlexNet is a deep convolutional neural network that popularized GPU-trained CNNs for large-scale image classification.

  • ConvNeXt modernizes convolutional networks with design choices inspired by vision transformers for image recognition tasks.

  • DenseNet connects each layer to every other layer in a feed-forward CNN to improve feature reuse and gradient flow.

  • Meta self supervised vision foundation model for learning general purpose image features without labels.

  • EfficientNet scales CNN depth width and resolution with compound coefficients for efficient image recognition models.

  • GoogLeNet/Inception uses inception modules to build efficient deep convolutional networks for image classification and detection.

  • MobileNet is a family of efficient CNN architectures using depthwise separable convolutions for mobile vision tasks.

  • ResNet introduced deep residual learning with skip connections for training very deep image recognition networks.

  • PyTorch image models collection with pretrained vision architectures and training utilities.

  • VGG very deep convolutional networks are Oxford VGG models known for simple stacked 3x3 convolutions in image recognition.

  • Vision Transformer applies Transformer encoders to image patches for image classification and vision representation learning.

Detection & segmentation

  • TensorFlow DeepLab implementation for semantic image segmentation with atrous convolution models.

  • Facebook AI Research library for object detection and segmentation, with reference implementations of Mask R-CNN, RetinaNet, and other architectures.

  • Meta Detection Transformer model for end to end object detection with transformers and bipartite matching.

  • Reference Faster R-CNN codebase for region proposal based object detection research.

  • OpenMMLab PyTorch toolbox for object detection instance segmentation and related vision research.

  • Meta Segment Anything model for promptable image segmentation and mask generation.

  • Meta Segment Anything Model 2 for promptable object segmentation in images and videos.

  • Convolutional neural network architecture for biomedical image segmentation from the Freiburg lab.

  • Ultralytics implementation of YOLO models for object detection segmentation pose and tracking workflows.

Vision-language & multimodal

  • Salesforce BLIP-2 vision language pretraining project for bootstrapping image to language models.

  • OpenAI contrastive vision language model for connecting images and text in a shared embedding space.

  • Microsoft Florence-2 vision foundation model for captioning object detection grounding and segmentation tasks.

  • Large Language and Vision Assistant project for multimodal chat and visual instruction tuning.

  • Vision language model using sigmoid loss for image text representation learning in Transformers.

OCR & document AI

  • Mindee document text recognition library for OCR with detection recognition and document parsing pipelines.

  • Python OCR library for text detection and recognition across many languages using deep learning models.

  • PaddlePaddle OCR toolkit for multilingual text detection recognition and document parsing.

  • Open source OCR engine for recognizing printed text from images and scanned documents.

  • Transformer based OCR model in Hugging Face Transformers for printed and handwritten text recognition.

Image processing & augmentation

  • Fast image augmentation library used to improve computer vision model training data pipelines.

  • Differentiable computer vision library for PyTorch with image processing geometry augmentation and vision AI components.

  • Open source computer vision and machine learning software library for image and video applications.

  • Python Imaging Library fork for opening manipulating and saving many image file formats.

  • Python library of image processing algorithms built for scientific and computer vision workflows.

Datasets & platforms

  • COCO is a large-scale dataset for object detection, segmentation, keypoint detection, and vision evaluation.

  • Open Images is a large dataset of images with annotations for classification, detection, and segmentation.

  • Pascal VOC provides benchmark datasets and challenges for visual object classification, detection, and segmentation.

  • Computer vision platform for dataset management labeling model training deployment and application building.