GeoSANE: Learning Geospatial Representations from Models, Not Data

GeoSANE teaser figure showing the shift from data-centric pretraining to weight-centric model generation.

Overview. Instead of pretraining yet another geospatial foundation model from satellite imagery, GeoSANE learns from the weights of existing remote sensing models and generates new models for downstream tasks.

TL;DR

GeoSANE learns a shared latent representation directly from model weights.
It unifies knowledge from heterogeneous geospatial foundation and task-specific models.
It can generate models for classification, segmentation and object detection.
It can also directly generate lightweight models (e.g., MobileNet-scale ~3.5M parameters) that outperform distilled or pruned counterparts.

Abstract

Recent advances in remote sensing have led to an increase in the number of available foundation models; each trained on different modalities, datasets, and objectives, yet capturing only part of the vast geospatial knowledge landscape. While these models show strong results within their respective domains, their capabilities remain complementary rather than unified. Therefore, instead of choosing one model over another, we aim to combine their strengths into a single shared representation.

We introduce GeoSANE, a geospatial model foundry that learns a unified neural representation from the weights of existing foundation models and task-specific models, able to generate novel neural networks weights on-demand. Given a target architecture, GeoSANE generates weights ready for finetuning for classification, segmentation, and detection tasks across multiple modalities.

Models generated by GeoSANE consistently outperform their counterparts trained from scratch, match or surpass state-of-the-art remote sensing foundation models, and outperform models obtained through pruning or knowledge distillation when generating lightweight networks. Evaluations across ten diverse datasets and on GEO-Bench confirm its strong generalization capabilities. By shifting from pre-training to weight generation, GeoSANE introduces a new framework for unifying and transferring geospatial knowledge across models and tasks.

How it works

Method

GeoSANE follows a simple three-stage pipeline: collect remote sensing models, learn a shared latent weight space, and generate new weights for a user-specified prompt architecture.

Method overview. A heterogeneous collection of remote sensing models is embedded into a shared latent representation, from which GeoSANE generates new downstream model weights.

1. Model collection

GeoSANE is trained on a diverse collection of remote sensing models covering different architectures, modalities, and tasks.

2. Latent weight-space learning

Model parameters are tokenized and processed by a transformer encoder-decoder to learn a shared latent representation of neural network weights.

3. On-demand generation

Given a prompt architecture such as ViT-L or Swin-B, GeoSANE samples in latent space and decodes new weights ready for fine-tuning.

Diverse model collection

Pie charts showing diversity of models in the GeoSANE collection.

The training pool spans foundation models, segmentation models, detectors, multimodal models, and more.

Why this matters

No need to train yet another foundation model from raw data.
Knowledge transfer happens directly in weight space.
The same learned space can generate both large and lightweight models.

Benchmarks

Results

GeoSANE improves over training from scratch, is competitive with strong remote sensing foundation models, and generates lightweight models that outperform pruning and distillation baselines.

96

remote sensing models in the training collection

38B

approximate parameters across the collected models

14

downstream datasets across classification, segmentation, and detection (including Geo-BENCH classification benchmarks)

Main Takeaways

Experiment	Takeaway
Comparing with training from scratch	GeoSANE improves performance across all ten benchmark datasets.
Comparing with other remote sensing foundation models	GeoSANE achieves best or second-best performance across the main benchmarks.
Comapring with pruning and distillation	GeoSANE directly generates lightweight models that outperform compressed baselines in most settings.
Comparing with prompt models	Generated models consistently outperform the ImageNet-pretrained prompt architectures used for generation.

Convergence curves comparing GeoSANE-initialized models against models trained from scratch.

Faster convergence. GeoSANE-initialized models start strong and maintain a clear advantage over random initialization during fine-tuning.

Qualitative analysis

Generated models across tasks

GeoSANE is not limited to classification. The learned latent space supports segmentation and object detection through generated backbones fine-tuned for downstream applications.

Flood segmentation

Examples on Sen1Floods11 showing SAR input, prediction, and ground truth.

Object detection

Detection examples on DIOR dataset in aerial imagery.

Latent structure

UMAP visualization of the latent weight space colored by architecture and modality.

The learned weight space organizes remote sensing models into meaningful clusters by both architecture and sensing modality.

Citation

BibTeX

@article{XXXX,
  title   = {GeoSANE: Learning Geospatial Representations from Models, Not Data},
  author  = {Hanna, Jo{"e}lle and Falk, Damian and Yu, Stella X. and Borth, Damian},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2026}
}