Plotting operational ICON data and benchmarking

Created by: guidocioni

This is not a bug but instead a request of clarifications and additional infos.

I have a suite of scripts which download, process and plot data from the operational ICON model run at DWD and available on opendata.dwd.de. This data contains a LOT of points so optimizing the processing and plotting of data is paramount in order to maintain the script operational every day.

I'm trying to understand whether a transition from my combination of basemap and matplotlib to psyplot would be possible. In order to do that I'm trying to benchmark execution time of psyplot when doing some of my standard plots. In theory, as far as I understand, for filled contour plots psyplot is calling pyplot.tricontourf under the hood, so the difference should not be that big. The only difference, I guess, is an additional layer of preprocessing on the data, which I'm doing anyway in my script.

However I'm having some problems with psy-maps.

First of all, as the files downloaded from DWD opendata server do not have grid information inside (presumably to save space) I need to do a cdo setgrid before passing the file to psyplot.project.plot.mapplot function otherwise the plotting does not work. As far as I understand psyplot does not only need clat and clon from the grid but also the edges. Is it possible to use one of the more primitive methods before mapplot to pass only three 1-dimensional arrays (e.g. clat, clon and data) as done in pyplot.tricontourf? This way I can use an external file to store only the grid information as I currently do.

Second, I tried to reproduce my basic plot that you see here which I simply do like this:

dset= read_dataset(variables=['T'])
lat, lon = get_coordinates()
temp_850 = dset['t'].metpy.sel(vertical=850 * units.hPa).load()
temp_850.metpy.convert_units('degC')
fig = plt.figure(figsize=(15, 10))
ax  = plt.gca()
_, x, y = get_projection(lon, lat, "world")
cs = ax.tricontourf(x, y, temp_850[0], extend='both', cmap=get_colormap("temp"),
                                    levels=np.arange(-35., 30., 1.))

read_dataset, get_projection and get_coordinates are functions that I wrote to simplify the code. They only wrap some xarray and basemap calls. This takes about 34 seconds of wall time; the input array has dimensions time: 93, ncells: 2949120. The coordinates lat and lon are saved in a different file.

Trying to do the same with psy-maps I first created a file where I also store the entire grid information inside (not only clat and clon otherwise it doesn't work as said before)

psy.plot.mapplot('~/Downloads/temp/test_psy/T_2020093006_global_grid.nc',
                 name='t', projection='robin',
                 levels=np.arange(238, 303, 2.), cmap=get_colormap("temp"),
                 )

I get this result which takes about 50 seconds. It seems that the levels parameter is completely ignored and somehow I cannot pass the pyplot axis to mapplot to control the figure size.

Do you have any idea on how to create a plotting script that I could use to reproduce my plotting setup so that I could benchmark it and see whether it is really faster?

Right now in the documentation of psy-maps there are only examples with really small grids so I'm not sure how the packages would behave for larger grids.