Difference between revisions of "Talk:Air Quality/Chemistry Naming Conventions"

From Earth Science Information Partners (ESIP)
Line 5: Line 5:
  
 
=Martin Schultz playing the devil's advocate =
 
=Martin Schultz playing the devil's advocate =
4.7.2007
+
 
 
Hi,
 
Hi,
  
  very good! It is becoming more and more clear to me that a lot of systematic thinking already went into the CF standards (and certainly Jonathan deserves a lot of credit for this). Yet, I am still a bit sceptical whether this can really get acceptance by the large community if they need to adapt so thoroughly and get rid of many old habits and custom units. Microsoft also made ist fortune by challenging the customer with small changes at a time and sacrificing the perfect system for a better chance to drag the crowd along. Translated to our problem at present, I am still wondering if it wouldn't be better to define some non-udunits "interim standards" just to keep people happy. And if they swallow the first bite and implement CF in their models and tools, one can then in a few years time work on making the system more stringent. My concern is also related to the non-existance of suitable evaluation tools which will make good use of all th enice attributes and standard names. More and more I get the impression that we are trying to model too many semantic sophistication into the definitions, which makes it practically impossible to project onto a software code as the complexity of this code must be quite large from the start. Yet another concern is my experience with improper netcdf files. Every error that can be made will be made at some point, and if we rely too much on the meaning of attributes, we are certain to get garbage results quite soon. One can of course implement some checking for consistency etc., but I see it as highly improbably that one will be able to catch all errors, and the system is becoming complex enough that it will be difficult to diagnose an error and correct it. Just two simple examples of what can easily go wrong:
+
very good! It is becoming more and more clear to me that a lot of systematic thinking already went into the CF standards (and certainly Jonathan deserves a lot of credit for this). Yet, I am still a bit sceptical whether this can really get acceptance by the large community if they need to adapt so thoroughly and get rid of many old habits and custom units. Microsoft also made ist fortune by challenging the customer with small changes at a time and sacrificing the perfect system for a better chance to drag the crowd along. Translated to our problem at present, I am still wondering if it wouldn't be better to define some non-udunits "interim standards" just to keep people happy. And if they swallow the first bite and implement CF in their models and tools, one can then in a few years time work on making the system more stringent. My concern is also related to the non-existance of suitable evaluation tools which will make good use of all th enice attributes and standard names. More and more I get the impression that we are trying to model too many semantic sophistication into the definitions, which makes it practically impossible to project onto a software code as the complexity of this code must be quite large from the start. Yet another concern is my experience with improper netcdf files. Every error that can be made will be made at some point, and if we rely too much on the meaning of attributes, we are certain to get garbage results quite soon. One can of course implement some checking for consistency etc., but I see it as highly improbably that one will be able to catch all errors, and the system is becoming complex enough that it will be difficult to diagnose an error and correct it. Just two simple examples of what can easily go wrong:
 +
 
 
(1) a certain software tool requires the ordering of levels from top to bottom, and thus you need a small program to reverse the order of the hybrid coefficients and all model fields. Since you are under pressure to deliver results, you will not worry about the attributes, and immediately your "direction:up" will be wrong. The file is still a "good" file in the sense that the plotting software can read it and will always display the correct information for a chosen level. Yet, if you want to take advantage of the "direction" attribute, you will be mislead.
 
(1) a certain software tool requires the ordering of levels from top to bottom, and thus you need a small program to reverse the order of the hybrid coefficients and all model fields. Since you are under pressure to deliver results, you will not worry about the attributes, and immediately your "direction:up" will be wrong. The file is still a "good" file in the sense that the plotting software can read it and will always display the correct information for a chosen level. Yet, if you want to take advantage of the "direction" attribute, you will be mislead.
 +
 
(2) assume you have a set of files with accumulated deposition fluxes ("amount" according to the new proposal). For a multi-year average of monthly values, you could for example use ncea from the NCO tools. Hardly anyone will afterwards think about a necessary adaptation of the standard name or cell_methods field (and how would you write this? "mean_of_sum"? impossible for any plotting program to"understand this!).  
 
(2) assume you have a set of files with accumulated deposition fluxes ("amount" according to the new proposal). For a multi-year average of monthly values, you could for example use ncea from the NCO tools. Hardly anyone will afterwards think about a necessary adaptation of the standard name or cell_methods field (and how would you write this? "mean_of_sum"? impossible for any plotting program to"understand this!).  
  OK: my message is: (a) try to keep it simple, (b) avoid redundancies, (c) differentiate between tags for autmated processing and tags for human information, (d) provide very clear guidelines as to when a file is CF compliant and which standards are mandatory and which are optional (perhaps one should think about multiple "compliance levels"? level 0 would be the bony basics, level 1 would fulfill a certain set of elements necessary for standard automated processing, level 2 would include all tags amenable for automated processing, and level 3 includes correct tags for human information.
+
 
 +
OK: my message is:  
 +
(a) try to keep it simple,  
 +
(b) avoid redundancies,  
 +
(c) differentiate between tags for autmated processing and tags for human information,  
 +
(d) provide very clear guidelines as to when a file is CF compliant and which standards are mandatory and which are optional (perhaps one should think about multiple "compliance levels"? level 0 would be the bony basics, level 1 would fulfill a certain set of elements necessary for standard automated processing, level 2 would include all tags amenable for automated processing, and level 3 includes correct tags for human information.
  
 
Don't misunderstand me, please! I am very much interested in seeing this happen (else I wouldnt reply at all). I am only playing the devil's advocate here.
 
Don't misunderstand me, please! I am very much interested in seeing this happen (else I wouldnt reply at all). I am only playing the devil's advocate here.

Revision as of 08:28, July 10, 2006

Go back to Start page for Atmospheric Chemistry and Aerosol Names PLEASE DO NOT USE THE NAVIGATION BAR ON THE LEFT HAND SIDE!

Go to Agreed Items of Discussion on CF Naming Extensions - General .


Martin Schultz playing the devil's advocate[edit source | reply | new]

Hi,

very good! It is becoming more and more clear to me that a lot of systematic thinking already went into the CF standards (and certainly Jonathan deserves a lot of credit for this). Yet, I am still a bit sceptical whether this can really get acceptance by the large community if they need to adapt so thoroughly and get rid of many old habits and custom units. Microsoft also made ist fortune by challenging the customer with small changes at a time and sacrificing the perfect system for a better chance to drag the crowd along. Translated to our problem at present, I am still wondering if it wouldn't be better to define some non-udunits "interim standards" just to keep people happy. And if they swallow the first bite and implement CF in their models and tools, one can then in a few years time work on making the system more stringent. My concern is also related to the non-existance of suitable evaluation tools which will make good use of all th enice attributes and standard names. More and more I get the impression that we are trying to model too many semantic sophistication into the definitions, which makes it practically impossible to project onto a software code as the complexity of this code must be quite large from the start. Yet another concern is my experience with improper netcdf files. Every error that can be made will be made at some point, and if we rely too much on the meaning of attributes, we are certain to get garbage results quite soon. One can of course implement some checking for consistency etc., but I see it as highly improbably that one will be able to catch all errors, and the system is becoming complex enough that it will be difficult to diagnose an error and correct it. Just two simple examples of what can easily go wrong:

(1) a certain software tool requires the ordering of levels from top to bottom, and thus you need a small program to reverse the order of the hybrid coefficients and all model fields. Since you are under pressure to deliver results, you will not worry about the attributes, and immediately your "direction:up" will be wrong. The file is still a "good" file in the sense that the plotting software can read it and will always display the correct information for a chosen level. Yet, if you want to take advantage of the "direction" attribute, you will be mislead.

(2) assume you have a set of files with accumulated deposition fluxes ("amount" according to the new proposal). For a multi-year average of monthly values, you could for example use ncea from the NCO tools. Hardly anyone will afterwards think about a necessary adaptation of the standard name or cell_methods field (and how would you write this? "mean_of_sum"? impossible for any plotting program to"understand this!).

OK: my message is: (a) try to keep it simple, (b) avoid redundancies, (c) differentiate between tags for autmated processing and tags for human information, (d) provide very clear guidelines as to when a file is CF compliant and which standards are mandatory and which are optional (perhaps one should think about multiple "compliance levels"? level 0 would be the bony basics, level 1 would fulfill a certain set of elements necessary for standard automated processing, level 2 would include all tags amenable for automated processing, and level 3 includes correct tags for human information.

Don't misunderstand me, please! I am very much interested in seeing this happen (else I wouldnt reply at all). I am only playing the devil's advocate here.

Best regards,

Martin