joelio182 4 days ago

Really cool to see more tooling making self-supervised learning usable on real-world datasets. Domain shift is a recurring pain, especially when labels are limited—so being able to pretrain directly on unlabeled data is a big deal. Also great to see it open-sourced under AGPL. Have you tried LightlyTrain on any more niche domains, like satellite or industrial inspection data? Would be interesting to see how it performs outside the usual benchmarks. Nice work!

  • isusmelj 3 days ago

    Thanks for the kind words, joelio182! Glad you see the value in making SSL more practical for real-world domain shift issues.

    As liopeer mentioned, we have results for medical (DeepLesion) and agriculture (DeepWeeds) in the blog post. We haven't published specific benchmarks on satellite or industrial inspection data yet, but those are definitely the kinds of niche domains where pretraining on specific unlabeled data should yield significant benefits. We're keen to explore more areas like these.

    Our goal is exactly what you pointed out - bridging the gap between SSL research and practical application where labels are scarce. Appreciate the encouragement!

liopeer 4 days ago

Computer Vision pretraining for the masses!

leonax97 4 days ago

Finally a production-ready framework for pretraining!

isusmelj 4 days ago

Hi HN, I’m Igor, co-founder of Lightly AI (https://www.lightly.ai/).

We just released LightlyTrain, a new open-source Python package (AGPL-3.0, free for research and educational purpose) for self-supervised pretraining of computer vision models: https://github.com/lightly-ai/lightly-train

Standard vision models pretrained on generic datasets like ImageNet or COCO often underperform on specific domains (e.g., medical, agriculture, autonomous driving). Fine-tuning helps, but performance is limited, and getting enough labeled data is expensive and slow.

LightlyTrain uses self-supervised learning (SSL) to pretrain models directly on your own unlabeled images or videos. This adapts the model to your specific visual domain before fine-tuning, leading to significantly better performance with less labeled data.

Key Features:

- No Labels Needed: Pretrain using your existing unlabeled image data.

- Better Performance: Consistently outperforms training from scratch and ImageNet-pretrained weights, especially in low-data regimes and domain-specific tasks (benchmarks in README/blog). We see gains across detection, classification, and segmentation.

- Domain Adaptation: Tailor models to your specific industry (manufacturing, healthcare, retail, etc.).

- Supports Popular Models: Works out-of-the-box with YOLO (v5-v12), RT-DETR, ResNet, ViTs, etc., integrating with frameworks like Ultralytics, TIMM, Torchvision.

- Easy to Use & Scalable: Simple pip install, minimal code to start, scales to millions of images, runs fully on-premise (single/multi-GPU). We built this because while SSL research is mature, making it easily accessible and effective for industry computer vision teams was hard. LightlyTrain aims to bridge that gap.

We’ve benchmarked it on COCO, BDD100K (driving), DeepLesion (medical), and DeepWeeds (agriculture), showing strong improvements over baselines (details in the repo/blog post linked below). For example, on COCO with only 10% labels, LightlyTrain pretraining boosted YOLOv8-s mAP by +14% over ImageNet weights and +34% over no pretraining.

- GitHub Repo: https://github.com/lightly-ai/lightly-train

- Docs: https://docs.lightly.ai/train

- Detailed Blog Post/Benchmarks: https://www.lightly.ai/blog/introducing-lightly-train

- Quick Demo Video: https://youtu.be/5Lmry1k_cA8

We’re here to answer any questions! Happy to discuss the tech, benchmarks, or use cases. Commercial licenses are also available for businesses needing different terms.

  • Sonnigeszeug 4 days ago

    Cool! We use yolo and have good success after labeling 1k of images but i'm happy to try it out.

    Does AGPL mean i can't use my model for my image detection or does it mean i can't use your software if i would want to provide finetuning service (which i don't want to).

    • isusmelj 3 days ago

      Hi Sonnigeszeug, great that you're looking into LightlyTrain!

      We designed LightlyTrain specifically for production teams who need a robust, easy-to-use pretraining solution without getting lost in research papers. It builds on learnings from our MIT-licensed research framework, LightlySSL (github.com/lightly-ai/lightly), but is tailored for scalability and ease of integration.

      For commercial use where the AGPL terms might not fit your needs, we offer straightforward commercial licenses for LightlyTrain. Happy to chat more if that's relevant for you!

      • Sonnigeszeug 3 days ago

        Hey!

        we are a small startup and using our own model.

        I would do a benchmark to see if its worth it but whats your pricing?

        And do i now assume AGPL stands in your case for training internally?

  • UncleEntity 4 days ago

    > AGPL-3.0, free for research and educational purpose...

    ...or any other purpose allowable under the AGPL like, wait for it, commercial purposes.

    • isusmelj 3 days ago

      You're right, UncleEntity, thanks for highlighting that. My phrasing could have been clearer. AGPL does allow various uses, including commercial, provided its terms are met.

      Our intention with LightlyTrain (AGPL/Commercial license option) is to offer a streamlined, production-ready pretraining engine. This contrasts with our other library, LightlySSL (github.com/lightly-ai/lightly), which is MIT-licensed and geared towards researchers needing flexible building blocks.

      We found many companies wanted a simpler "it just works" solution for pretraining, which is why LightlyTrain exists with its specific licensing options tailored for commercial teams alongside the AGPL.

      Thanks again for the clarification!