ROADFORMER: PYRAMIDAL DEFORMABLE VISION TRANSFORMERS FOR ROAD NETWORK EXTRACTION WITH REMOTE SENSING IMAGES

RoadFormer: Pyramidal deformable vision transformers for road network extraction with remote sensing images

RoadFormer: Pyramidal deformable vision transformers for road network extraction with remote sensing images

Blog Article

The data-complete and detail-correct road network information serves as important evidence in read more numerous transportation-associated applications.Regular and rapid road network inventory updating is significantly necessary and meaningful to provide better services.Remote sensing images, due to their advantageous overlooking earth observation properties, have been widely used to assist in the road network interpretation tasks.However, it is still an open issue to accurately separate the road contents from the surrounding land covers in the remote sensing image with good connectivity and integrality because of the remarkably challengeable conditions of roads.

In this regard, we develop a pyramidal deformable vision transformer architecture, termed as RoadFormer, to extract road networks with remote sensing images.Specifically, designed by a multi-context patch apac1/60/1/cw embedding scheme, a higher-quality token embedding can be obtained by adopting a multi-range, multi-view context observation strategy.Furthermore, formulated with a deformable transformer architecture, the semantic-relevant features can be focused on in a sparse global manner, which effectively promotes the feature representation quality and robustness.The proposed RoadFormer is elaborately evaluated on three large-scale road network extraction datasets.

Quantitative assessments show that the RoadFormer achieves an overall performance of 0.8886 and 0.9407 with respect to the intersection over union (IoU) and F1-score metrics.In addition, contrastive evaluations also convince the promising potentiality and outstanding superiority of the RoadFormer for interpreting the road sections of varying circumstances under diverse challenging image scenarios.

Report this page