ERDAS ECW JPEG2000 v5.0 performance teaser

As the team starts tying up loose ends before the ERDAS ECW JPEG2000 SDK v5 release I thought I’d post a teaser of some performance metrics we’ve collected as part of the QA. To keep it somewhat relevant to the average user and not just developers I have compiled some stats comparing throughput using a variety of ECW and JPEG2000 drivers using the GDAL utilities.

The real aim here was to ensure no regressions for our own ERDAS SDK with recent driver enhancements but it also allows us to draw some other interesting conclusions relevant for anyone using ECW and JPEG2000 formats (via GDAL or not).

Test Plan

  1. Grab a handful of customer sourced, real-world ECW and JPEG2000 images and use them for straight-line decoding via GDAL
  2. Grab a handful of customer sourced, raw uncompressed TIF and IMG images and use them for straight-line encoding via GDAL
  3. Perform the same tests on Windows and *gasp* Linux
  4. Run for two cold test runs. Plot the average result
  5. Hope we outperform our previous SDK releases and hope we outperform other JPEG2000 drivers

Test data

  • world.ecw 86,400 x 43,200 px x 3 band uint8
  • large_ortho.ecw, 550,000 x 390,000 px x 3 band uint8
  • ADS80.ecw, 75,332 x 10,513 x 4 band, uint16 lossy 20:1 (ECW v3)
  • perth.ecwp, 12,445 x 10,594 px x 3 band uint8
  • world.jp2 86,400 x 43,200 px x 3 band uint8
  • CIR_ortho.jp2, 60,202 x 47,934px x 4 band uint8
  • epje-nitf.jpc, 5,654 x 11,677 px x 8 band uint8 (NITF EPJE Profile)
  • pleiades.jp2, 8,496 x 16,384 px x 4 band uint8
  • ADS80.jp2, 75,332 x 10,513 x 4 band, uint16 lossy 20:1
  • landsat_7.img, 12,661 x 12,601 px x 7 bands uint8
  • naturalearth.tif, 21,600 x 10800px x 3 band uint8
  • pan.tif, 35,180 x 28184 px x 1 band uint8
  • ads40.img, 12,236 x 12,196px x 3 band uint8 

Test cases

Decoding

  • ECW #1time gdal_translate -of GTiff -co “COMPRESS=JPEG” -co “PHOTOMETRIC=YCBCR” -co “TILED=yes” world.ecw test.tif
  • ECW #2 : time gdallocationinfo world.ecw 2156 30321
  • ECW #3 : time gdal_translate -of GTiff -co “TILED=yes” -projwin 319794 5754321 322641 5752034 large_ortho.ecw test.tif
  • ECW #4 : time gdal_translate -of GTiff -co “TILED=yes” -srcwin 20000 5000 1500 1500 ADS80.ecw test.tif
  • ECWP #1 : time gdal_translate -of GTiff -co “COMPRESS=JPEG” -co “PHOTOMETRIC=YCBCR” -co “TILED=yes” ecwp://demo-apollo.geospatial.intergraph.com/images/australia/perth_05mar04_psh.ecw test.tif
  • JP2 #1 : time gdal_translate -of GTiff -co “COMPRESS=JPEG” -co “PHOTOMETRIC=YCBCR” -co “TILED=yes” world.jp2 test.tif
  • JP2 #2 : time gdal_translate -of GTiff -co “TILED=yes” CIR_ortho.jp2 test.tif
  • JP2 #3 : time gdal_translate -of GTiff -co “TILED=yes” epje-nitf.jpc test.tif
  • JP2 #4 : time gdal_translate -of GTiff -co “TILED=yes” pleiades.jp2 test.tif
  • JP2 #5 : time gdal_translate -of GTiff -co “TILED=yes” -b 2 -b 3 -b 4 -srcwin 2000 5000 800 800 CIR_ortho.jp2 test.tif
  • JP2 #6 : time gdal_translate -of GTiff -co “TILED=yes” -srcwin 20000 5000 1500 1500 ADS80.jp2 test.tif

 Encoding

  • ECW #1time gdal_translate -of ECW -co “TARGET=95″ landsat_7.img test.ecw
  • ECW #2 : time gdal_translate -of ECW -co “TARGET=95″ NaturalEarth.tif test.ecw
  • ECW #3 : time gdal_translate -of ECW -co “TARGET=95″ pan.tif test.ecw
  • ECW #4 : time gdal_translate -of ECW -co “TARGET=95″ ads40.img test.ecw
  • JP2 #1 : time gdal_translate -of JP2ECW -co “TARGET=95″ landsat_7.img test.jp2
  • JP2 #2 : time gdal_translate -of JP2ECW -co “TARGET=95″ NaturalEarth.tif test.jp2
  • JP2 #3 : time gdal_translate -of JP2ECW -co “TARGET=95″ pan.tif test.jp2
  • JP2 #4 : time gdal_translate -of JP2ECW -co “TARGET=95″ -co “PROGRESSION=LRCP” NaturalEarth.tif test.jp2
  • JP2 #5 : time gdal_translate -of JP2ECW -co “TARGET=0″ NaturalEarth.tif test.jp2

Environment

  • Windows:
    • Windows 7 x64
    • GDAL 1.9.2 x64 VC100
    • Core i7 1.7ghz, 8gb RAM
    • Data read from Corsair M4 256gb SSD, written to attached 7.2k SATA
  • Linux: (Virtualized)
    • Ubuntu Server 12.04 x64
      • gcc 4.7.2
    • GDAL 1.10 (svn trunk)
    • 4 core 1.7ghz, 4gb RAM
    • Data read from Corsair M4 256gb SSD, written to attached 7.2k SATA

Note 1: The same libecwj2 3.3 library was used on both platforms. The same public patches were applied to mirror what is used in the wild.

Note 2: The environments are not intended to be compared. Please keep this in mind when looking at the results and do not try and draw any Linux vs Windows conclusions

Results

ecwjp2-linux-decoding ecwjp2-linux-encoding ecwjp2-windows-decoding ecwjp2-windows-encoding

Talking points

  1. If you currently use v3 SDK on Linux, v5 will give you significant decoding and encoding improvements no matter how you use our SDK
  2. JPEG2000 stability and robustness in the v5 is much improved. You will have to take my word for it right now but decoding test JP2 #1 shows 0 which indicates a crash in the old SDK. Test JP2 #4 shows a performance drop however the v3 SDK didn’t properly decode all precincts within the file. So although it completed quickly the output was not valid. I am sure many who have used v3 would agree that it had poor compatibility on particular JP2 format profiles so v5 should pleasantly surprise you.
  3. What about (geo) Jasper? What about OpenJPEG? I tested both on Linux however the driver and or SDK’s are just not suitable for these images. I actually recompiled three times all with the suggested patches with no change in behaviour. Both SDK’s were only successful completing JP2 #3, Jasper in 9842 secs (gave up at 50% and extrapolated it out) and OpenJPEG in 387.32 secs. All other decoding tests failed due to out of memory or seg faults indicating fundamental limitations of these toolkits when decoding geospatially-sized imagery. These are not large images so it was surprising to see such poor results for the open source JP2 drivers. So  beware not to draw any negative conclusions about JPEG2000 as a format when using these drivers.
  4. What about Kakadu? I did not have access to the recent kdu release so did not include the library in these results. I’m sure someone else will soon
  5. ECWP decoding gets a significant speedup due to the new ECWP v3 protocol which opens multiple-threads when retrieving the remote image stream. Anyone curious about the GDAL Async Reader should definitely check this out as i’m yet to see any third-party apps using this feature. Although not tested here ECWP progressive reading should outperform JPIP from the JP2KAK driver by a significant margin even when reading from the same input JP2 (ECW over ECWP will be faster still)
  6. Decoding JP2 #1 and ECW #1 tests are equivalent except for the storage format. The test results clearly show that for equivalent compressed files GDAL JP2 decoding is 2x (Windows) to 3x (Linux) slower than the same ECW. For multi-threaded use this gap will widen further in favour of ECW. This will be most noticable in workflows such as tile cache generation or enterprise imagery services. Remember wavelet formats arent all the same
  7. Encoding tests JP2 #2 and ECW #2 are also equivalent. In this case, JP2 compression is 10% (Windows) to 30% (Linux) slower. Although not charted, ECW output file size was 13,265 KB vs JP2 15,060KB for the same GDAL quality target of 95 (or 20:1 ratio). 
ecwjp2-corruptdisplay

Although it looks pretty, corrupt JP2 display like this will be rarely seen with v5

When will it be out? Very soon so stay tuned  ..

 

 

Mythbusting ECW decompression

Anyone who frequents the ESRI ArcGIS Desktop v10 publishing wizard would be familiar with the screenshot below. But has anyone stepped back and thought, hang on why is wavelet compression bad? For many readers, you may recall we have had an ESRI ArcPAD ECW plugin since 2003. Way back then this was powered by tiny 300Mhz mobile CPU’s and draw performance for ECW’s were instant. What has changed with wavelet technology that has meant in 2012 its something to be fearful of in server use? Have we really gone backwards?

Full disclosure: As of writing, I’m currently Product Manager for APOLLO IWS

As per, http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00sq0000000t000000.htm

*ahem, in my best robot voice ..*

You have a raster map layer that uses wavelet image compression (for example, MrSID, JPEG 2000, ECW), which can impede map drawing performance.

What should I do Raster robot!?

..convert your wavelet-compressed raster dataset to a more efficient display format

Why?

… because the data does not have to be uncompressed at display time.

Ok, semi-plausible by the sounds. Is there anything else published to qualify these claims I wonder?

http://proceedings.esri.com/library/userconf/pug11/papers/arcgis_server-performance_and_scalability-optimization_and_testing.pdf

“Avoid wavelet compression-based raster types (MrSID,JPEG2000)” (Page6)

“Tiled, JPEG compressed TIFF is the best (10-400% faster)  (page 11)

-          Andrew Sakowicz, ESRI Professional Services Redlands, April 2011

Wow. This seems to be scattered everywhere; it must be true.

Let’s verify what another Image Server has to say just in case. We can’t go past the FOSS4G Geoserver on steroids paper and I’m not quite sure what happened to the text ..

So armed with an ECW file, I know it meets the above objectives. I wonder what Geoserver states about performance  ..

PROPRIETARY!? Oh no; but hang on what is the performance? More digging at http://opengeo.org/publications/geoserver-production/ gives a small nugget, but alas no details

In addition to adding overviews, using raster formats based on wavelet transforms such as ECW and MrSID will also improve performance

Further digging into the FOSS4G ’09 Benchmarking raster results with the same test data I’m about to use show Geoserver ECW (260mb) peaking at 11.2 maps/sec vs uncompressed tiled TIF raster (~16,000mb) at 13.7 maps/sec. Admittedly these results are out of date, nevertheless the performance gain of 20% came at an enormous storage cost. Image quality unfortunately wasn’t analyzed nor was JPEG compression. So there does seem to be at least a little performance penalty, in the old Geoserver version anyway and only comparing against uncompressed tiled tif. So not really a useful reference in this context.

And now the kicker that I bet half of you are salivating over. ECW SDK EULA requires a paid license from Intergraph | ERDAS for use in a server environment. Money!? Absurd.

So how about we actually verify these claims. Based on the variety of literature found in a whole ten minutes, there are countless people saying that JPEG compressed TIF should at least match ECW performance. ESRI tells me that wavelet compression is bad, takes longer to decompress and is proprietary. JPEG TIFF is 30% bigger but look the same and are as fast as ECW, apparently. (ref)

But if all of that is true why on Earth do Intergraph | ERDAS think they can charge for it and why is it nigh impossible to validate all these claims? Lets find out through a simple example ..

Take my favourite small sample image,

Driver: GTiff/GeoTIFF
Files: world-topo-bathy-200406-3x86400x43200.tif
Size is 86399, 43199
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.004166666666667,-0.004166666666667)
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -89.9958333) (180d 0' 0.00"W, 89d59'45.00"S)
Upper Right ( 179.9958333,  90.0000000) (179d59'45.00"E, 90d 0' 0.00"N)
Lower Right ( 179.9958333, -89.9958333) (179d59'45.00"E, 89d59'45.00"S)
Center      (  -0.0020833,   0.0020833) (  0d 0' 7.50"W,  0d 0' 7.50"N)
Band 1 Block=256x256, ColorInterp=Red
Min=1.000 Max=10.000
Minimum=1.000, Maximum=10.000, Mean=1.615, StdDev=0.746
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700
Band 2 Block=256x256, ColorInterp=Green
Min=3.000 Max=30.000
Minimum=3.000, Maximum=30.000, Mean=8.209, StdDev=1.499
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700
Band 3 Block=256x256, ColorInterp=Blue
Min=12.000 Max=68.000
Minimum=12.000, Maximum=68.000, Mean=22.177, StdDev=2.873
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700

Take the uncompressed dataset (a whopping 15 gb!) and create the preferred format to “optimize space, quality and speed” using GDAL 1.8.1. You will see I have enabled all the optimal flags including ycbcr with average resampling

gdal_translate -of GTiff -co “TILED=yes” -co “PHOTOMETRIC=YCBCR” -co “COMPRESS=JPEG” world-topo-bathy-200406-3x86400x43200.tif gdal_compressed_world.tif

Input file size is 86399, 43199

0…10…20…30…40…50…60…70…80…90…100 – done.

gdaladdo -r average –config COMPRESS_OVERVIEW JPEG –config PHOTOMETRIC_OVERVIEW YCBCR –config INTERLEAVE_OVERVIEW PIXEL gdal_compressed_world.tif 2 4 8 16 32 64 128

0…10…20…30…40…50…60…70…80…90…100 – done.

Compression time was recorded with gdal_translate taking 8mins and gdaladdo 19mins taking total creation time to 27 minutes.

 Driver: GTiff/GeoTIFF
Files: gdal_compressed_world.tif
Size is 86399, 43199
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0],
UNIT["degree",0.0174532925199433],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000000,90.000000000000000)
Pixel Size = (0.004166666666667,-0.004166666666667)
Metadata:
TIFFTAG_SOFTWARE=ERDAS IMAGINE
TIFFTAG_XRESOLUTION=1
TIFFTAG_YRESOLUTION=1
TIFFTAG_RESOLUTIONUNIT=1 (unitless)
AREA_OR_POINT=Area
Image Structure Metadata:
SOURCE_COLOR_SPACE=YCbCr
COMPRESSION=YCbCr JPEG
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -89.9958333) (180d 0' 0.00"W, 89d59'45.00"S)
Upper Right ( 179.9958333,  90.0000000) (179d59'45.00"E, 90d 0' 0.00"N)
Lower Right ( 179.9958333, -89.9958333) (179d59'45.00"E, 89d59'45.00"S)
Center      (  -0.0020833,   0.0020833) (  0d 0' 7.50"W,  0d 0' 7.50"N)
Band 1 Block=256x256, ColorInterp=Red
Min=1.000 Max=10.000
Minimum=1.000, Maximum=10.000, Mean=1.615, StdDev=0.746
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700, 2700x1350, 1350x675, 675x338
Band 2 Block=256x256, ColorInterp=Green
Min=3.000 Max=30.000
Minimum=3.000, Maximum=30.000, Mean=8.209, StdDev=1.499
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700, 2700x1350, 1350x675, 675x338
Band 3 Block=256x256, ColorInterp=Blue
Min=12.000 Max=68.000
Minimum=12.000, Maximum=68.000, Mean=22.177, StdDev=2.873
Overviews: 43200x21600, 21600x10800, 10800x5400, 5400x2700, 2700x1350, 1350x675, 675x338

An equivalent ECW was then created using ERDAS Imagine from the same uncompressed TIF with a 20:1 target compression ratio. Compression / creation time as below.

For consistency, gdalinfo output is below.

Driver: ECW/ERDAS Compressed Wavelets (SDK 3.x)
Files: world-topo-bathy-200406-3x86400x43200-20x.ecw
Size is 86399, 43199
Coordinate System is:
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9108"]],
AXIS["Lat",NORTH],
AXIS["Long",EAST],
AUTHORITY["EPSG","4326"]]
Origin = (-180.000000000000030,90.000000000000014)
Pixel Size = (0.004166666666667,-0.004166666666667)
Corner Coordinates:
Upper Left  (-180.0000000,  90.0000000) (180d 0' 0.00"W, 90d 0' 0.00"N)
Lower Left  (-180.0000000, -89.9958333) (180d 0' 0.00"W, 89d59'45.00"S)
Upper Right ( 179.9958333,  90.0000000) (179d59'45.00"E, 90d 0' 0.00"N)
Lower Right ( 179.9958333, -89.9958333) (179d59'45.00"E, 89d59'45.00"S)
Center      (  -0.0020833,   0.0020833) (  0d 0' 7.50"W,  0d 0' 7.50"N)
Band 1 Block=86399x1, ColorInterp=Red
Overviews: 43199x21599, 21599x10799, 10799x5399, 5399x2699, 2699x1349, 1349x674, 674x337, 337x168
Band 2 Block=86399x1, ColorInterp=Green
Overviews: 43199x21599, 21599x10799, 10799x5399, 5399x2699, 2699x1349, 1349x674, 674x337, 337x168
Band 3 Block=86399x1, ColorInterp=Blue
Overviews: 43199x21599, 21599x10799, 10799x5399, 5399x2699, 2699x1349, 1349x674, 674x337, 337x168

Storage requirements for both outputs,

-          JPEG Compressed TIFF (75%):  362,953 KB

-          ECW (20:1): 109,650 KB

So we have at least squashed one of the referenced claims. JPEG Compressed TIFF (with yCbCr) is approximately 3.5x and not just 30% larger for this dataset. Note that the storage requirements include the gdaladdo embedded overviews as you’d be surprised how many people do not factor these in..

As I wanted to investigate the decompression performance, I created additional ECW’s but this time with a much lower target ratio to try and get a file size similar to the JPEG TIFF. This in theory would make the disk I/O “fairer” in any subsequent performance test. It is however critical to note that we recommend a target ratio of 15:1 to 20:1 to retain visually lossless RGB imagery. Creating a file with such a small target will give minimal quality difference (take a look at the mp4 below) at the expense of storing and reading a lot more data.

-          ECW (6:1): 263,691 KB

-          ECW (3:1): 289,106 KB

Because of the way the compression algorithm works even with a very small target ratio the actual compression rate was still quite high due to the image having large water bodies that compress very well.

An additional JPEG compressed TIF was created with  –co “JPEG_QUALITY=90” as well as gdaladdo  JPEG_QUALITY_OVERVIEW to enhance the default 75% quality(see here).  With all other GDAL flags being equal this produced,

-          JPEG Compressed TIFF (90%): 587,439 KB

Now we have some data lets configure some tests. I will of course be testing ERDAS APOLLO IWS v11.0.2 on Windows 7 x64. Test hardware is a quad core, 8 thread Core i7 with 8gb RAM and a single 7200k attached disk.

I usually find a visual comparison helpful to further understand any subsequent JMeter metrics. After all, how much difference is 200ms, 50ms? To do this I created an OpenLayers website with synchronised maps and on each the loadend event is captured so a JS table can be displayed to give indicative performance values. Remember, this is not a load test and so system resources are not in contention.

MP4 screen recording: ecw-vs-tif-format-comparison (3 mins)

JMeter was configured to iterate through a small test plan of 100 WMS requests across the image in a single thread group to make a repeatable test. Unlike FOSS4G Benchmarking, the tests will all be cold-start as I hate with a passion “Warm” tests. Servers are restarted and then each thread group is run consecutively with 1 thread.  FYI, IWS uses internally the v4.2 ECW SDK as well as GDAL 1.7 for the TIF reading

Before analyzing the above I can already hear you thinking, damn your software just sucks reading TIF doesnt it!? You are skewing the results!

So the next step is independant verification. Without this, I would be no better than the other quotes sprouting 400% improvements compared with … *undisclosed*

  1. Grab the latest Geoserver v2.1.3 jetty build.
  2. Use the existing JDK 1.7 x64 version,
  3. Configure two new data/layers pointing to the same TIF datasets
  4. Enable JAI “JPEG Native Acceleration”
  5. Coverage Access / Queue Type: “UNBOUNDED”
  6. Suggested Tile size 512,512
  7. Default interpolation type: Nearest Neighbour

Note: This is just a ballpark comparison. No JVM tuning or any other settings were changed from default. Remember, IWS was also an out-of-the-box configuration so I paid Geoserver some extra attention :)

Both servers were then restarted and JMeter test plan refreshed. Results are as follows,

Label Samples Average Median 90% Line Min Max Error % Throughput KB/sec
IWS 4326 ECW 20:1 100 116 115 159 59 278 0 8.50557115 586.0675
IWS 3857 ECW 20:1 100 171 167 214 111 261 0 5.79038796 432.9373
IWS 4326 ECW 3:1 100 111 118 154 52 193 0 8.90234132 612.3722
IWS 3857 ECW 3:1 100 174 167 214 109 644 0 5.70678537 426.2425
IWS 4326 75% TIF 100 198 203 290 74 546 0 5.01781324 313.2798
IWS 3857 75% TIF 100 225 226 302 119 444 0 4.40858793 281.9394
IWS 4326 90% TIF 100 201 206 289 74 510 0 4.927322 311.4298
IWS 3857 90% TIF 100 228 237 299 126 611 0 4.35179947 286.429
Geoserver 4326 TIF 75% 100 201 179 245 94 1449 0 4.92926505 244.4857
Geoserver 3857 TIF 75% 100 289 252 511 100 1044 0 3.43902607 194.1044
Geoserver 4326 TIF 90% 100 194 169 238 103 902 0 5.1266277 261.2856
Geoserver 3857 TIF 90% 100 276 257 469 98 606 0 3.6101083 208.9514

So what does all this mean?

  • Firstly, IWS TIF throughput performance was very similar to Geoserver’s. I don’t really want to get into arguments with OpenGeo/Geosolutions on this as we could be here for days tuning and is not really the point of the post. For all intensive purposes one can adequately say the APOLLO IWS TIF implementation is not encumbered in any way
  • ECW outperforms JPEG compressed TIF by almost 2x
  • ECW 20:1 produces higher image quality than a TIF compressed at 90% JPEG quality
  • ECW 20:1 requires 5.3x less data storage than the 90% JPEG compressed TIF
  • The image quality difference between 3:1 and 20:1 ECW was trivial and not worth a 2.5x increase in storage.
  • A performance drop was not recorded with the varying compression levels, regardless of format. This is expected given the low concurrency
  • ECW  20:1 compression/creation time is 1.5x faster than 75% JPEG compressed TIF

Summary

ECW when compared with the “JPEG compressed tiled Geotiff with embedded overviews” alternative format,

  • Can be created quicker (compress)
  • Requires 5x less disk storage
  • Retains higher image quality
  • Serves output imagery 2x faster (decompress)
  • Requires a ECW write and server license from ERDAS

For many of our customers with hundreds of terabytes and even petabytes of image data, the business justifications are all there for license aquisition. Whether it be talking to your IT area who has to manage increasing SAN costs, the data capture area who want to ensure quality is retained, or the end customer who just wants the imagery served to them as quickly as possible. The license fees are what we determine to be fair market price, but unfortunately many would prefer to ignore the benefits (or pretend they didnt exist) and in the end cost their organization more money in the process.

I suspect this post will attract the usual suspects, but if i can leave one thing in your minds it would be the following tweet from me,

Take-aways

  1. Wavelet based formats ECW, JPEG2000, MrSID should never be tainted with the same “wavelet” brush and grouped together. They do not perform the same and makes about as much sense as me grouping the variety of compression and structure options available for GeoTIFF and just saying “Geotiff is slow”
  2. If your current server software does in fact perform slower reading ECW then that is likely an architectural constraint or a poor implementation of the ECW SDK. If you are still using a DCOM multi-process based software then your usage will vary. Fear not though, there is a better solution out there :)

Now wheres my popcorn. It’s nice to post again ..

One confused image monkey

Melanie Harlow’s blog @ ESRI has left me like one of these …

Hmmm? Conversion from 89MB of MrSID’s to a JPEG2000 balooned to 922MB. What exactly does 75% compression mean in that context? Sounds pretty incorrect to me unless 75% means lossless which isnt exactly a fair test. Recompress to ~1:15 lossy jp2 and I would expect the total dataset size to be largely the same as the original MrSID.

Hmmm? I do believe you are missing a format in your comparison. ECW will save you space, use a complex algorithm, and improve the performance of reading the data. This statement is completely wrong. Sticking with a more verbose format like TIFF will require you to read more data, more files and offer less effective caching mechanisms

Before you go off and convert all your raster data to a highly compressed format to save space, please note that the more complex the compression the slower it is to read the data. Therefore, we generally recommend when speed is your #1 concern, that you use a TIFF raster dataset with JPEG compression

Hmmm? If the output mosaic is only 30,000 x 20,000 pixels then why not do a proper mosaic into a single MrSID, JP2 or ECW file? Keeping small tiled datasets removes the benefits these wavelet formats provide, gives worse performance and requires you to store redundant overview data thats simply not required if kept as a single file.

A good example on a larger dataset was the FOSS4G 2010 Benchmarking raster over Barcelona. 3 band, 8 bit 220,000 x 150,000 pixel dataset. Mosaiced to 1 ECW = 4.8 GB. The equivalent JPEG Compressed TIFF tiles totalled 112 GB. Thats a significant difference without even looking at the performance gains.

I will wait for Part Duex that I’m sure will clarify this =)

The tilecache goldrush

Does anyone else not see a problem with the trend over the past few years? “Tile-itis” is reaching critical mass and it is driving me bonkers. We’re taking away styling, reprojection, tile sizes and giving them … tiles. No wait, fast tiles? Really? Oh, so I can put them on Google Maps? Awesome. Can I have them in projection X? No, sorry, we don’t have another terabyte to reseed the cache. Can I have just the streets? No sorry, same problem.

Why do I seem like the only one asking “wtf” when I see something like this at OAM,

This means, as a rule of thumb, that the network must store ((4/3) + 1) * 3 = 7 MB of imagery plus tiles for every 1 MB of source imagery uploaded. If we load up all of the approximately 4 TB of LandSat-7 data at a 30m resolution, and generate a complete tile set, we will need 16-28 TB of storage in the network to hold it all. If stored on EC2, this would cost up to US$3,000 per month — and that’s just for one layer at a low resolution.

Or when a user asks a simple question

We want to serve the US NAIP Aerials in 1m resolution (which are a total of about 4.7 TB of MrSid/Jp2 data) on a interactive  web map as an optional map background. [sic] .. we determined early on is that MapServer is too slow to serve compressed imagery such as the native MrSid Jp2 imagery on the fly for our needs. [On using Mapserver to serve uncompressed tifs] … would also “blow up” the total data volume to something about 60 TB … Thus, we are in the process of researching options on how to serve the compressed data as fast as possible “on the fly” and without the need for caching them on disk

All replies, except one from (somewhat ironically :) ) Christopher Schmidt, ignores the initial constraint and instantly tells the user a cache is required.

The root of the problem is the assumption that for every organisation, every deployment, you absolutely, unequivocally must create a tile-geo-arcgis-spatial-osm-mapproxy-squid-cache. We’ve gotta do what Google does! I truly fear many organisations are being misled and are unnecessarily transitioned to tiling solutions when quite frankly they don’t need to. More importantly though, GIS software representatives are using the community affinity addiction(?) for tiling everything to mask quite frankly, badly poorly performing software to begin with.

So let us all take a deeeep breath next time you’re scoping out an imagery solution. Why do you need a tile cache? That’s great that your cache can max out a 100mbit connection (its not hard), but you’ve not only increased your storage requirements by a factor of 4, 8 or 20 times, you’ve also taken away other functionality for your customers and limited yourself to one convention.

If you do need a cache and by crikey they are needed in many situations, implement LRU or a hybrid cache solution but most importantly, give your customers the original WMS service. For all its warts, at least it gives them some options.

So to answer both quotes above,

  1. Storing 4TB of uncompressed Landsat 7, 30m data for the whole world as a single compressed ECW at 1:20 will be approx. 200 gb, visually lossless and $30 per month to store on Amazon S3. As some examples, i have the following 3 band mosaics
    1. Landsat742.ecw, 1,414,317 px x  534,778 px which totals 2,515,088 KB (yes, thats ~2.5gb). Did i mention this was created way back in 2003?
    2. Melbourne.ecw, 413,333 px x 346,667 px which totals 30,626,916 KB or ~30 GB from our friends at SKM Ausimage
    3. Metro_Central_2007_Mosaic.ecw,  224,100 px x 304,400 px which totals ~11.5 GB from Landgate
  2. ERDAS Apollo can serve all these mosaics, as 256px tiles on demand and still max out the 100mbit network; no problems. To prove, I ran our tiling test tool over a gigabit connection back to Apollo to see the throughput over a short 180 second test plan
    1. Landsat.ecw
      1. Random: 31837 tiles, avg 181.79 tiles per second, RT 0.03 seconds, throughput 15.2 MB / sec
      2. Sequential: 60673 tiles, avg 314.41 tiles per second, RT 0.02 seconds, throughput 26.65 MB / sec
    2. Melbourne.ecw
      1. Random: 10286 tiles, avg 109.92 tiles per second, RT 0.05 seconds, throughput 13.43 MB / sec
      2. Sequential: 39980 tiles, avg 230.25 tiles per second, RT 0.02 seconds, throughput 34.89 MB / sec
    3. Metro_Central_2007_Mosaic.ecw
      1. Random: 35585 tiles, avg 203.18 tiles per second, RT 0.02 seconds, throughput 33.15 MB / sec
      2. Sequential: 47191 tiles, avg 271.19 tiles per second, RT 0.02 seconds, throughput 51.12 MB / sec

So instead of looking at pure throughput of the cache tile server (which has been proven to be a fizzer), if we also take into account the storage requirements and plot the two variables, I know which one I’d choose. That ERDAS Apollo license is looking pretty damn attractive right now, isn’t it … isnt it *starts shaking*?

What I also find interesting is there seems to be a slight resurgence back to on-demand solutions after, invariably, users realise the scalability or flexibility issues with full tile caches. JPEG2000 seems to be making a comeback thats for sure for image serving, but dont forget Kakadu has the same licensing restriction as the ECWJP2 SDK, it aint free-as-in-beer either. OSM Mod_tile is also a good example of a hybrid solution with on demand rendering.

ps. Has anyone tested beyond 100mbit on any other tiling solution?

pps. ERDAS has its own tiling container format known as OTDF. Clearly this is for our most demanding customers where they need performance above and beyond the above

FUD, FUD, FUD some more

The Simon Hope vs Paul Ramsey posts has some classic asides.

I just had to re-quote the following comment from Atanas Entchev as it made me laugh. I am now personally tasked at seeking out and destroying this mysterious section of psychologists deep within ERDAS headquarters. I will also disassemble all subliminal messages embedded within our marketing and my blog *dons hat*. I have even heard the ESRI psychologist department is some 500 people strong!!

[sic]… the flawed assumption that decision-makers always make decisions based on reason.The “dealers”, on the other hand, know this to be false. So they employ (I speculate) psychologists to design sales tactics (such as FUD) that identify and target decision-makers’ *emotions*. They sell the sizzle, not the steak.

I like my sizzle as well as a good steak. If the steak tastes appalling I send it back. If I didn’t inquire to what I was ordering and expected pork? Well …

And then from Ian Turton,

Does your software use open standards that allow me to switch to another program next year or am I hooked to a conveyor belt of increasing license charges year after year?

Yes, my software does use open standards and yes if you’d like to switch to another program next year be my guest. How many organisations using opensource switch from mapserver to geoserver to mapnik to deegree to mapguide and back again every year? Mapwindow to QGIS to GRASS to UDIG to JUMP? SQLite to Postgres to Mysql? FDO to OGR to Geotools …? Although the FUD from opensource radicals (for lack of a better word) that proprietary solutions have a perpetual ball and chain, this just isn’t true. Sure some workflows are but certainly not to the degree some make out and I’d be damned to think of many without alternatives.

Come on lads, the underlying expectation here is that the vendors are somehow responsible for corporate (or not) entities selecting the wrong tool for the job or paying through the nose when there are viable and cost effective alternatives. Due diligence is king. After all, you are the ones with the $$, the phone to the ear, the door you can close, the conference you didn’t have to attend, the support and maintenance you didn’t have to renew and the software you didn’t have to use. Its my job to prove to you the value of ERDAS offerings, just as its Simon’s job to prove ESRI, Brett’s to prove Mapinfo or FME and Cameron’s to prove  Mapserver or Geoserver. Whats the diff, really,between Cameron doing the pushing and the first three?

http://blog.cleverelephant.ca/2010/05/whos-your-dealer.html