Difference between revisions of "Talk:Air Quality/Chemistry Naming Conventions"
(98 intermediate revisions by 11 users not shown) | |||
Line 1: | Line 1: | ||
− | {{ | + | {{CF-links}} |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Go to [[Agreed_Items_Air_Quality/Chemistry_Naming_Convention|Agreed Items of Discussion on Air_Quality/Chemistry_Naming_Conventions - General ]]. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | =Martin Schultz playing the devil's advocate = | |
− | + | Hi, | |
− | + | very good! It is becoming more and more clear to me that a lot of systematic thinking already went into the CF standards (and certainly Jonathan deserves a lot of credit for this). Yet, I am still a bit sceptical whether this can really get acceptance by the large community if they need to adapt so thoroughly and get rid of many old habits and custom units. Microsoft also made ist fortune by challenging the customer with small changes at a time and sacrificing the perfect system for a better chance to drag the crowd along. Translated to our problem at present, I am still wondering if it wouldn't be better to define some non-udunits "interim standards" just to keep people happy. And if they swallow the first bite and implement CF in their models and tools, one can then in a few years time work on making the system more stringent. My concern is also related to the non-existance of suitable evaluation tools which will make good use of all th enice attributes and standard names. More and more I get the impression that we are trying to model too many semantic sophistication into the definitions, which makes it practically impossible to project onto a software code as the complexity of this code must be quite large from the start. Yet another concern is my experience with improper netcdf files. Every error that can be made will be made at some point, and if we rely too much on the meaning of attributes, we are certain to get garbage results quite soon. One can of course implement some checking for consistency etc., but I see it as highly improbably that one will be able to catch all errors, and the system is becoming complex enough that it will be difficult to diagnose an error and correct it. Just two simple examples of what can easily go wrong: | |
− | |||
− | : | ||
− | : | + | (1) a certain software tool requires the ordering of levels from top to bottom, and thus you need a small program to reverse the order of the hybrid coefficients and all model fields. Since you are under pressure to deliver results, you will not worry about the attributes, and immediately your "direction:up" will be wrong. The file is still a "good" file in the sense that the plotting software can read it and will always display the correct information for a chosen level. Yet, if you want to take advantage of the "direction" attribute, you will be mislead. |
− | |||
− | |||
− | + | (2) assume you have a set of files with accumulated deposition fluxes ("amount" according to the new proposal). For a multi-year average of monthly values, you could for example use ncea from the NCO tools. Hardly anyone will afterwards think about a necessary adaptation of the standard name or cell_methods field (and how would you write this? "mean_of_sum"? impossible for any plotting program to"understand this!). | |
− | :: | + | OK: my message is: |
+ | (a) try to keep it simple, | ||
+ | (b) avoid redundancies, | ||
+ | (c) differentiate between tags for autmated processing and tags for human information, | ||
+ | (d) provide very clear guidelines as to when a file is CF compliant and which standards are mandatory and which are optional (perhaps one should think about multiple "compliance levels"? level 0 would be the bony basics, level 1 would fulfill a certain set of elements necessary for standard automated processing, level 2 would include all tags amenable for automated processing, and level 3 includes correct tags for human information. | ||
− | + | Don't misunderstand me, please! I am very much interested in seeing this happen (else I wouldnt reply at all). I am only playing the devil's advocate here. | |
− | + | Best regards, | |
− | + | Martin | |
− | : | + | [[User:Martin Schultz |Martin Schultz]] 4 July 2006 (EDT) |
− | |||
− | |||
− | |||
− | + | =Christiane Textor's answer = | |
− | + | Hi, | |
− | |||
− | |||
− | |||
− | + | just a short answer: | |
− | |||
− | + | 1) "interim standard" cannot be called "standard" anymore, we should not create confusion. | |
− | |||
− | |||
− | + | 2) we do not only ask people to do additional work, but also offer a lot of service to them when we analyse their models, this might also make them happy. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | 3) the non-existance of suitable evaluation tools: | |
− | + | There are tools existing: I am in contact with people from PCMDI and will probably be able to provide some routines to map standard_names with variable names to be used in the existing analysis tools (like IDL). | |
− | : | ||
− | |||
− | |||
− | + | 4) I fully agree on the statement "Every error that can be made will be made at some point". But it is independant from the CF conventions, in contrast, CF helps to minimize errors. Of course other tools, like Automod, would do some basic checks if the data are ok (e.g. for the vertical achsis it is enought to check if the pressure is decreasing with height). | |
− | + | 5) I agree that the averaging of "amount" variables could be a problem - but it would not help much to include the time period in the unit (e.g. kg/m2/month). A solution would be to include the averaging period in the variable name. | |
− | : | + | 6) keep it simple: a very good idea. but levels of compliance do not seem very simple to me... |
− | |||
− | |||
− | + | In summary, I think there should be only one standard. The CF names have to goal to be as clear as possible to avoid mistakes, and I feel that our virtual working group is very efficient in fulfilling these requirements: Thanks to you all. | |
− | + | Best regards, | |
− | + | Christiane | |
− | |||
− | |||
− | : | + | [[User:Christiane Textor|Christiane Textor]] 4 July 2006 (EDT) |
− | :: | + | |
− | ---- | + | =Comments of Jonathan Gregory= |
+ | |||
+ | Dear Christiane and Martin | ||
+ | |||
+ | I agree with Christiane's comments. | ||
+ | |||
+ | |||
+ | >> 3) the non-existance of suitable evaluation tools: | ||
+ | >> There are tools existing | ||
+ | |||
+ | In particular there is the CF-checker, which verifies conformance to the | ||
+ | standard in a "syntactic" sense, as specified by the conformance document | ||
+ | http://www.cgd.ucar.edu/cms/eaton/cf-metadata/conformance-req.html | ||
+ | There is also the CMOR F90 library written by PCMDI to | ||
+ | help people write CF-compliant netCDF more easily. | ||
+ | |||
+ | |||
+ | >> 5) I agree that the averaging of "amount" variables could be a problem - | ||
+ | |||
+ | This problem comes up with other amount variables too, like precipitation. | ||
+ | One solution could be to recognise that if you are averaging them, you are | ||
+ | maybe treating them as a rate, not an amount. Hence the standard name should | ||
+ | be the one of rate, not amount. The unit does not have to be kg m-2 s-1. It | ||
+ | could be kg m-2 day-1, for instance. Although udunits allows "month" we don't | ||
+ | recommend it because its definition is not a calendar month but a particular | ||
+ | (constant) number of seconds - probably not what you want. But the time bounds | ||
+ | of the variable should always indicate the meaning period. | ||
+ | |||
+ | With both rates and amounts, climatological time bounds may help as well, | ||
+ | with which you can record that it is (for instance) the January mean over a | ||
+ | number of years (see CF 7.4). I hope that | ||
+ | tools such as nco may be extended to produce the cell_methods attribute to | ||
+ | describe this, since CF is becoming quite important; in fact we could request | ||
+ | such an extension. | ||
+ | |||
+ | |||
+ | >> (1) a certain software tool requires the ordering of levels from top to | ||
+ | >> bottom, and thus you need a small program to reverse the order of the | ||
+ | >> hybrid coefficients and all model fields. Since you are under pressure | ||
+ | >> to deliver results, you will not worry about the attributes, and | ||
+ | >> immediately your "direction:up" will be wrong. The file is still a | ||
+ | >> "good" file | ||
+ | |||
+ | positive is not affected by the ordering of the coordinates. It indicates only | ||
+ | whether larger or smaller values mean up or down. | ||
+ | |||
+ | |||
+ | >> (a) try to keep it simple, (b) avoid redundancies | ||
+ | |||
+ | Both of these are principles of CF. See | ||
+ | http://www.cgd.ucar.edu/cms/eaton/cf-metadata/clivar_article.pdf | ||
+ | |||
+ | >> (c) differentiate between tags for autmated processing and | ||
+ | >> tags for human information | ||
+ | |||
+ | We try to provide both at once i.e. metadata which is precise for programs but | ||
+ | also intelligible to humans. This minimises redundancy. | ||
+ | |||
+ | |||
+ | >> (d) provide very clear guidelines as to when | ||
+ | >> a file is CF compliant and which standards are mandatory and which are | ||
+ | >> optional | ||
+ | |||
+ | There is only kind of compliance defined at present, because most features of | ||
+ | CF (beyond COARDS) are optional, for backward compatibility. But in a | ||
+ | particular application or project you could of course insist on certain | ||
+ | features or choices within the standard. | ||
+ | |||
+ | Thanks for your comments. Best wishes | ||
+ | |||
+ | Jonathan | ||
+ | |||
+ | [[User:Jonathan Gregory|Jonathan Gregory]] 5 July 2006 (EDT) | ||
+ | |||
+ | =Mail to Jonathan Gregory on July 10, Christiane Textor, answers from July 11 = | ||
+ | '''some of the items of the original mail appear in other discussions to which they pertain''' | ||
+ | ==CF-COMPLIANCE CHECKER== | ||
+ | '''CT:''' After discussion with HTAP people, Martin Schultz in particular, I feel that it would be nice, if the compliance checker could be more informative. It would be helpful to obtain more information on why compliance is not reached. Is this possible? | ||
+ | |||
+ | '''JG:''' It depends on software engineering effort. Of course, I agree with you. Do you have any effort available from your project? I would hope that either Kyle or Alison might be able to work on this at some point, but at present both of them are still learning about CF, so I don't foresee any immediate help. | ||
+ | |||
+ | ==CF HOME PAGE AND DOCUMENTATION== | ||
+ | '''CT:''' It would be good to have a very simple and short summary of what the main objectives of CF and what CF-compliance means. The documents provided on line are not that easy to understand, and there are many of them, I found 6 relevant links, see | ||
+ | http://wiki.esipfed.org/index.php/Air_Quality/Chemistry_Naming_Resources | ||
+ | Would it be possible to have one simple and short version for CF-beginners? | ||
+ | |||
+ | '''JG:''' Again, this would depend on someone else being spun-up enough to write it. Do you think you or anyone else might produce a draft? http://www.cgd.ucar.edu/cms/eaton/cf-metadata/clivar_article.pdf (on the CF home page) is supposed to be something anyone could understand, and it states the objectives of CF near the start. Is this doc too complicated? | ||
+ | |||
+ | '''CT:''' I have still some questions, for example in section 4 of the http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-current.html document, no standard_names are given, instead long_names are used. Do I misunderstand something? | ||
+ | |||
+ | '''JG:''' No. Some of the examples were written before standard names were defined. Standard names are optional, though. | ||
+ | |||
+ | ==nco == | ||
+ | '''CT:''' ... nco is essential for us! ... | ||
+ | |||
+ | '''JG:''' I don't expect that nco will recognise the cell_measures, if you mean use it | ||
+ | to do global sums etc. However your own analysis software could use this, | ||
+ | couldn't it. If you think nco should be extended, please write to them to ask | ||
+ | for it. They are aware of CF. |
Latest revision as of 08:43, January 9, 2007
Return to Start page for Atmospheric Chemistry and Aerosol Names PLEASE DO NOT USE THE NAVIGATION BAR ON THE LEFT HAND SIDE!
Go to Agreed Items of Discussion on Air_Quality/Chemistry_Naming_Conventions - General .
Martin Schultz playing the devil's advocate
Hi,
very good! It is becoming more and more clear to me that a lot of systematic thinking already went into the CF standards (and certainly Jonathan deserves a lot of credit for this). Yet, I am still a bit sceptical whether this can really get acceptance by the large community if they need to adapt so thoroughly and get rid of many old habits and custom units. Microsoft also made ist fortune by challenging the customer with small changes at a time and sacrificing the perfect system for a better chance to drag the crowd along. Translated to our problem at present, I am still wondering if it wouldn't be better to define some non-udunits "interim standards" just to keep people happy. And if they swallow the first bite and implement CF in their models and tools, one can then in a few years time work on making the system more stringent. My concern is also related to the non-existance of suitable evaluation tools which will make good use of all th enice attributes and standard names. More and more I get the impression that we are trying to model too many semantic sophistication into the definitions, which makes it practically impossible to project onto a software code as the complexity of this code must be quite large from the start. Yet another concern is my experience with improper netcdf files. Every error that can be made will be made at some point, and if we rely too much on the meaning of attributes, we are certain to get garbage results quite soon. One can of course implement some checking for consistency etc., but I see it as highly improbably that one will be able to catch all errors, and the system is becoming complex enough that it will be difficult to diagnose an error and correct it. Just two simple examples of what can easily go wrong:
(1) a certain software tool requires the ordering of levels from top to bottom, and thus you need a small program to reverse the order of the hybrid coefficients and all model fields. Since you are under pressure to deliver results, you will not worry about the attributes, and immediately your "direction:up" will be wrong. The file is still a "good" file in the sense that the plotting software can read it and will always display the correct information for a chosen level. Yet, if you want to take advantage of the "direction" attribute, you will be mislead.
(2) assume you have a set of files with accumulated deposition fluxes ("amount" according to the new proposal). For a multi-year average of monthly values, you could for example use ncea from the NCO tools. Hardly anyone will afterwards think about a necessary adaptation of the standard name or cell_methods field (and how would you write this? "mean_of_sum"? impossible for any plotting program to"understand this!).
OK: my message is: (a) try to keep it simple, (b) avoid redundancies, (c) differentiate between tags for autmated processing and tags for human information, (d) provide very clear guidelines as to when a file is CF compliant and which standards are mandatory and which are optional (perhaps one should think about multiple "compliance levels"? level 0 would be the bony basics, level 1 would fulfill a certain set of elements necessary for standard automated processing, level 2 would include all tags amenable for automated processing, and level 3 includes correct tags for human information.
Don't misunderstand me, please! I am very much interested in seeing this happen (else I wouldnt reply at all). I am only playing the devil's advocate here.
Best regards,
Martin
Martin Schultz 4 July 2006 (EDT)
Christiane Textor's answer
Hi,
just a short answer:
1) "interim standard" cannot be called "standard" anymore, we should not create confusion.
2) we do not only ask people to do additional work, but also offer a lot of service to them when we analyse their models, this might also make them happy.
3) the non-existance of suitable evaluation tools: There are tools existing: I am in contact with people from PCMDI and will probably be able to provide some routines to map standard_names with variable names to be used in the existing analysis tools (like IDL).
4) I fully agree on the statement "Every error that can be made will be made at some point". But it is independant from the CF conventions, in contrast, CF helps to minimize errors. Of course other tools, like Automod, would do some basic checks if the data are ok (e.g. for the vertical achsis it is enought to check if the pressure is decreasing with height).
5) I agree that the averaging of "amount" variables could be a problem - but it would not help much to include the time period in the unit (e.g. kg/m2/month). A solution would be to include the averaging period in the variable name.
6) keep it simple: a very good idea. but levels of compliance do not seem very simple to me...
In summary, I think there should be only one standard. The CF names have to goal to be as clear as possible to avoid mistakes, and I feel that our virtual working group is very efficient in fulfilling these requirements: Thanks to you all.
Best regards, Christiane
Christiane Textor 4 July 2006 (EDT)
Comments of Jonathan Gregory
Dear Christiane and Martin
I agree with Christiane's comments.
>> 3) the non-existance of suitable evaluation tools:
>> There are tools existing
In particular there is the CF-checker, which verifies conformance to the standard in a "syntactic" sense, as specified by the conformance document http://www.cgd.ucar.edu/cms/eaton/cf-metadata/conformance-req.html There is also the CMOR F90 library written by PCMDI to help people write CF-compliant netCDF more easily.
>> 5) I agree that the averaging of "amount" variables could be a problem -
This problem comes up with other amount variables too, like precipitation. One solution could be to recognise that if you are averaging them, you are maybe treating them as a rate, not an amount. Hence the standard name should be the one of rate, not amount. The unit does not have to be kg m-2 s-1. It could be kg m-2 day-1, for instance. Although udunits allows "month" we don't recommend it because its definition is not a calendar month but a particular (constant) number of seconds - probably not what you want. But the time bounds of the variable should always indicate the meaning period.
With both rates and amounts, climatological time bounds may help as well, with which you can record that it is (for instance) the January mean over a number of years (see CF 7.4). I hope that tools such as nco may be extended to produce the cell_methods attribute to describe this, since CF is becoming quite important; in fact we could request such an extension.
>> (1) a certain software tool requires the ordering of levels from top to
>> bottom, and thus you need a small program to reverse the order of the
>> hybrid coefficients and all model fields. Since you are under pressure
>> to deliver results, you will not worry about the attributes, and
>> immediately your "direction:up" will be wrong. The file is still a
>> "good" file
positive is not affected by the ordering of the coordinates. It indicates only whether larger or smaller values mean up or down.
>> (a) try to keep it simple, (b) avoid redundancies
Both of these are principles of CF. See http://www.cgd.ucar.edu/cms/eaton/cf-metadata/clivar_article.pdf
>> (c) differentiate between tags for autmated processing and >> tags for human information
We try to provide both at once i.e. metadata which is precise for programs but also intelligible to humans. This minimises redundancy.
>> (d) provide very clear guidelines as to when
>> a file is CF compliant and which standards are mandatory and which are
>> optional
There is only kind of compliance defined at present, because most features of CF (beyond COARDS) are optional, for backward compatibility. But in a particular application or project you could of course insist on certain features or choices within the standard.
Thanks for your comments. Best wishes
Jonathan
Jonathan Gregory 5 July 2006 (EDT)
Mail to Jonathan Gregory on July 10, Christiane Textor, answers from July 11
some of the items of the original mail appear in other discussions to which they pertain
CF-COMPLIANCE CHECKER
CT: After discussion with HTAP people, Martin Schultz in particular, I feel that it would be nice, if the compliance checker could be more informative. It would be helpful to obtain more information on why compliance is not reached. Is this possible?
JG: It depends on software engineering effort. Of course, I agree with you. Do you have any effort available from your project? I would hope that either Kyle or Alison might be able to work on this at some point, but at present both of them are still learning about CF, so I don't foresee any immediate help.
CF HOME PAGE AND DOCUMENTATION
CT: It would be good to have a very simple and short summary of what the main objectives of CF and what CF-compliance means. The documents provided on line are not that easy to understand, and there are many of them, I found 6 relevant links, see http://wiki.esipfed.org/index.php/Air_Quality/Chemistry_Naming_Resources Would it be possible to have one simple and short version for CF-beginners?
JG: Again, this would depend on someone else being spun-up enough to write it. Do you think you or anyone else might produce a draft? http://www.cgd.ucar.edu/cms/eaton/cf-metadata/clivar_article.pdf (on the CF home page) is supposed to be something anyone could understand, and it states the objectives of CF near the start. Is this doc too complicated?
CT: I have still some questions, for example in section 4 of the http://www.cgd.ucar.edu/cms/eaton/cf-metadata/CF-current.html document, no standard_names are given, instead long_names are used. Do I misunderstand something?
JG: No. Some of the examples were written before standard names were defined. Standard names are optional, though.
nco
CT: ... nco is essential for us! ...
JG: I don't expect that nco will recognise the cell_measures, if you mean use it to do global sums etc. However your own analysis software could use this, couldn't it. If you think nco should be extended, please write to them to ask for it. They are aware of CF.