Skip to content

add subsetting also in data loaders, not just in downlaoder#11

Open
JoachimKoenigslieb wants to merge 3 commits intodmidk:mainfrom
JoachimKoenigslieb:main
Open

add subsetting also in data loaders, not just in downlaoder#11
JoachimKoenigslieb wants to merge 3 commits intodmidk:mainfrom
JoachimKoenigslieb:main

Conversation

@JoachimKoenigslieb
Copy link
Copy Markdown

Fixes an issue where subsetting is only perfomed in the download branch of run_mode. This means the running on local files is very slow since we do the optical flow on the whole satellite domain!

To achieve this I have:

  • moved subset_to_bbox to geospatial to avoid circular imports
  • call subset_to_bbox in all branches of run_mode in main

@KristianHMoller
Copy link
Copy Markdown
Contributor

Unfortunately, in the current form, this does not work with out MSG-CPP data from KNMI, as it results in an empty y-dimension:

<xarray.Dataset> Size: 6kB
Dimensions:  (time: 4, y: 0, x: 713)
Coordinates:
  * time     (time) datetime64[us] 32B 2026-05-06T09:15:00 ... 2026-05-06T10:...
  * y        (y) float64 0B 
  * x        (x) float64 6kB -10.73 -10.69 -10.64 -10.6 ... 19.89 19.94 19.98
Data variables:
    crs      |S1 1B b''
    sds      (time, y, x) float32 0B 
    sds_cs   (time, y, x) float32 0B 

I will do a bit more digging and get back to you.

@KristianHMoller
Copy link
Copy Markdown
Contributor

Okay, this is a little odd, but the reason it fails is that for the MSG-CPP dataset (at least the way we download it), the latitudes are inverted. So the slice cuts out all data.

<xarray.Dataset> Size: 2MB
Dimensions:  (time: 1, y: 368, x: 713)
Coordinates:
  * time     (time) datetime64[us] 8B 2026-05-05T15:15:00
  * y        (y) float64 3kB 63.48 63.43 63.39 63.35 ... 47.4 47.36 47.32 47.27
  * x        (x) float64 6kB -10.73 -10.69 -10.64 -10.6 ... 19.89 19.94 19.98

So perhaps subset_to_bbox could end similar to this rather than our existing implementation which seems to work only for DWD data.

    # Handle potentially inverted coordinates by detecting order
    lat_values = ds[lat_coord].values
    lon_values = ds[lon_coord].values
    
    lat_ascending = lat_values[0] < lat_values[-1]
    lon_ascending = lon_values[0] < lon_values[-1]
    
    # Arrange slice bounds based on coordinate order
    lat_slice = slice(lat_min, lat_max) if lat_ascending else slice(lat_max, lat_min)
    lon_slice = slice(lon_min, lon_max) if lon_ascending else slice(lon_max, lon_min)

    return ds.sel(
        {
            lat_coord: lat_slice,
            lon_coord: lon_slice,
        }
    )

@KristianHMoller
Copy link
Copy Markdown
Contributor

KristianHMoller commented May 7, 2026

I think this is a very elegant solution to the issue of not cropping the data in the cases of files and s3!

@JoachimKoenigslieb
Copy link
Copy Markdown
Author

Have added your suggested code to also handle both "directions" of latlon grids.

Also added changelog and ran the pre-commit hook!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants