From timj@gtk.org Sun Jul 30 07:25:52 2000 Date: Sun, 30 Jul 2000 06:23:37 +0200 (CEST) From: Tim Janik To: Hacking Gnomes Subject: proposed getter function cleanups hi all, i'd like to finalize the getter function issue. we've had quite some input on the thread "gtk_label_get_text() string duping", and i'm herewith trying to sum up the points raised, account for them, and compose a proposal based on that to fix the current situation. current status ============== we're currently in the position of having half of our string getter functions return references and the other half (mostly newly added ones) returning copies, all using the common _get_() naming scheme. that is pretty inconsistent and therefore unintuitive for current and new develoeprs, leading to: - leaks in places where developers are not aware that they have to free the string returned - some programmers doing intensive research to figure whether they need to free strings from a getter or not - occasional crashes where strings are freed that were returned by reference reasoning ========= though this is probably going to to be obvious to some, it might still halp matters to outline why accessors and their ease of use represent an important aspect of our APIs: a) we need accessors, often to hide binary incomaptibilities and preserve implementation encapsulation. we also need them, because programmers using foreign widgets (i.e. non self-implemented ones) usually don't know what parts of an object structure are meaningfull and public, or are private respectively. b) people, using our API, shouldn't need to dig into the source to find out whether they have to free return values (sometimes very hard to figure from pure code), or lookup the function in the reference doc (which might not have evolved to a standard where resource management aspects are covered comprehensively). instead, the behaviour has to be evident from naming conventions. c) if we provide accessors for simple object fields which do nothing but encapsulting fields bundled with copying burden, API users are more likely to circumvent accessor functions to cut down on resource maintenance, thus defeating their prime purpose. d) to provide ease of use, i'm personally leaning strongly towards returning values by reference from getters in general and for strings specifcally. there are various aspects that lead me here: - to make our getter API consistent on a global scale, changing just the string returning ones introduces the least amount of work and breakage required, since for the most part, objects, lists etc. are already returned by reference. - usually, extra duplications isn't required. at least in my experience from dealing with the Gtk/GNOME API, and from reviewing other people's code, getter return values are most frequently used to be passed in to mutators that do their own duplication anyways. - return value duplication and thus requirement on the caller side to keep values around across single function invocations for the sole purpose of conforming memory management, defeats convenient use of the API. that is, accssors that do uneccesary duplication inhibit convenient nesting of functions which can be quite a saver for code portions that deal with lots of object state information. in literal code: gchar *tmp_string = gtk_label_get_text (label); g_print ("there's the text: %s\n", tmp_string); g_free (string); is a whole lot more tedious than simple function nesting: g_print ("there's the text: %s\n", gtk_label_get_text (label)); not to forget, that the above two extra lines have to be written for *every* use of gtk_label_get_text(), introducing a huge unnecessary overhead in user code. that being said, it is less my point to play the hardliner and enforce reference return values all over the place. if, for whatever reason, the a general consensus builds up to go for duplication all over the place, we'd still reach this proposal's primary goal of providing a consistent API. proposal ======== the following (or a similar) set of conventions should be added to the GNOME coding style guidlines: 1) functions named *_get_*, return values by reference for everything, including strings, objects, lists etc. if duplicates are required, the programmer can always do g_strdup(), g_object_ref(), g_list_copy() and so forth on the value returned 2) (lifetime specification) values returned by reference, such as lists, arrays, strings or objects from getter functions have to be duplicated or referenced to remain valid before new functions on the getter object or its associated object tree (children, parents) are invoked, or object specific locks are released and reaquired. 3) accessors that have to compose the returned value into newly allocated storage space must not be named *_get_*. instead it is recommended to name them after the action they perform, e.g. gchar* g_path_dup_dirname(), or GList* gtk_container_list_children(). frequent substitutes for _get_ here would be: *_dup_*, *_list_*, *_create_*, *_make_*, *_from_*, *_concat_*, *_strip_*, etc.. [further suggestions are apprechiated, especially as there are more complex examples, e.g. where list elements are also duplicated, but see 5) on this] 4) for compatiblity, some functions may still return duplicated values and thus leak if their return values aren't freed. those functions can be spotted through compiler warnings, enabled through certain module specific defines, usually _DISABLE_COMPAT_H. as an example, passing -DGLIB_DISABLE_COMPAT_H to the compiler causing foo.c:267: warning: no previous prototype for `g_path_get_dirname' can be fixed by s/g_path_get_dirname/g_path_dup_dirname/ in foo.c. implementation wise, this is achived by glib-compat.h containing: #ifndef GLIB_DISABLE_COMPAT_H # define g_path_get_dirname g_path_dup_dirname #endif 5) IMO, we're unlikely to come up with a generic convention that produces intuitive names for duped lists, duped lists with duped elements, duped lists with duped elemnts themselves being arrays/lists of duped elements etc... we may want to introduce a convention along the lines of: accesors that return complex values, e.g. lists, arrays etc. most likely need their caller to free the returned values up to a certain depth. here the function's documentation has to be consulted to figure the specifc requirement contra aspects ============== On Mon, 26 Jun 2000, Darin Adler wrote: > One reason a call that returns a copy is safer than a peek-tpye call is that > a peek-type call needs to document how long the returned value is good. This > is simple for a "get widget" call, where the widget's lifetime is tied to > the lifetime of the object you get it from. But for string calls it can be > subtle. An example of this is old version of gtk_entry_get_text in the gnome > 1.2 branch which has to manage a special variable for the returned text. How > long is the text returned good for? In practice it can be a real bug if > someone tries to use this returned text as a key for a hash table, for > example. The caller knows that you don't need to free the text but doesn't > know what its lifetime is. darin is right in raising the lifetime issue for reference-value returns. i've accounted for that under point 2) of the proposed conventions, values have to be duplicated/referenced before new getters/mutators are being invoked. in practice, this leaves enough room for the usual cases to avoid duplication (where return values are used as arguments for mutators), and of course, this is not a string-specific issue either, i.e. GtkWidget *label = GTK_BIN (button)->child; gtk_widget_set (button, "label", "hello, die child!", NULL); gtk_widget_show (label); screwes you just as much as trying to keep the label's string around, instead of the label itself. On 26 Jun 2000, Havoc Pennington wrote: > I do think it's unfortunate to expose to callers of a function whether > or not the string happens to already be stored somewhere. This is a > pretty strong argument for the always-return-a-copy behavior for _get_ > IMO. Perhaps we should look at renaming the non-copying functions > _peek_ instead of renaming the copying functions _dup_. > > Havoc i don't see how this applies to strings only either, that same argument can be made to do a deep-copy of every value returned by any function which, for simplicity and performance gains, is clearly something we want to avoid. having a global convention of how to deal with returned values efficiently is likely to be more fortunate ;) and i wouldn't tout exposal of the allocation nature of a returned value in the API as bad practice. in one way or another you'll always have to do that for non-GC languages. for example, there are plenty of libc man pages available to show you that resource maintenance has indeed be taken into account by the caller in a function specific fashion (and for what it's worth, duplication is generally avoided there if possible). another issue that had been raised, was problems with values by reference in threaded programs. yet again, that's also not a string specific issue, but more importantly, for object systems, such as widgets derived from GtkObject, that is already being accounted for with the global lock that has to be held across object accesses. object, string, etc. references have of course to be duplicated or referenced across lock releasing and reacquiring (which is also mentioned under 2) in the proposed cnoventions). in reality, this issue mostly only affects us in the glib API, where for the most part duplications are used with functions not named *_get_*, or, for *_get_* functions, static values are being returned. remainings ========== i personally would apprechiate former suggestions on the deep copying naming issue raised by george: On Sun, Jun 25, 2000 at 11:13:49PM -0700, George wrote: > Hmm, thought about this more. > > 1) This should apply to string vectors as well. > > 2) Now what about GList's. There are a bunch of functions that return these. > And there are 3 cases. > > a) A reference to an internal list > b) A newly allocated list of references of objects/strings > c) A newly allocated list of reffed/new objects or duped strings > > What to do about those? get for a), and dup for c) and remove b? Or do > we just give up in this case and just make the user read the docs. A > nicer way to solve this could be to provide a utility GLib type which > has a reference to the glist and knows if it should free the list, the > data or both and with what functions. > > George --- ciaoTJ