Version 0.8, March 06, 2017
Albert Graef <aggraef@gmail.com>
This package provides a light-weight, no frills interface to the C++ dictionary containers map
and unordered_map
. The stldict module makes these data structures available in Pure land and equips them with a (more or less) idiomatic Pure container interface.
The C++ containers are part of the standard C++ library, see the C++ standard library documentation for details. They were originally based on the Standard Template Library, so they are also sometimes referred to as “STL containers”; hence the name of this package.
Copyright (c) 2011 by Albert Graef.
pure-stldict is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
pure-stldict is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.
Get the latest source from https://bitbucket.org/purelang/pure-lang/downloads/pure-stldict-0.8.tar.gz.
Run make
to compile the modules and make install
(as root) to install them in the Pure library directory. This requires GNU make, and of course you need to have Pure (and a C++ library which includes the STL) installed.
make
tries to guess your Pure installation directory and platform-specific setup. If it gets this wrong, you can set some variables manually, please check the Makefile for details.
Note: This module requires Pure 0.50 or later and a recent version of the C++ library (GNU libstdc++ v3 has been tested). All proper C++11 libraries should work out of the box, while (recent) C++0x implementations may require some fiddling with the sources and/or the compilation options. Pre C++0x library versions surely require considerably more work, especially in the hashdict module.
After installation, you can use the operations of this package by placing the following import declaration in your Pure programs:
using stldict;
This imports the whole shebang. If you only need either the hashed or the ordered dictionaries, you can also import the corresponding modules separately, i.e.:
using hashdict;
or:
using orddict;
In Pure land, the C++ map
and unordered_map
containers and their multimap
variants are made available as a collection of four data structures:
hashdict
, hashmdict
hashdict.pure
module.
orddict
, ordmdict
<
predicate, like the ordered dictionary and set data structures in the standard library, and can be found in the orddict.pure
module.
Note that hashdict
and hashmdict
differ in that the former has exactly one key-value association for each key in the dictionary, while the latter is a “multidict” which allows multiple values to be associated with a key. The same applies to the orddict
and ordmdict
types.
In addition, there are various supertypes which correspond to different unions of the hashed and ordered dictionary types. These are:
hashxdict
, ordxdict
stldict
, stlmdict
stlxdict
For instance, you can use hashxdict
to match both hashdict
and hashmdict
values. Likewise, stlmdict
matches both hashmdict
and ordmdict
values. To match any kind of dictionary, use the stlxdict
type.
These data structures are very thin wrappers around the C++ container types; in fact, they are just pointers to the C++ containers. Memory management of these objects is automatic, and customizable pretty-printing is provided as well.
All data structures offer most of the usual Pure container interface (as well as some extensions). In contrast to the standard library dictionaries, they can be used both as dictionaries (holding key => value pairs) and sets (holding only keys, without associated values), even at the same time.
The other important difference to the standard library containers is that the stldict containers are mutable data structures; inserting and deleting members really modifies the underlying C++ containers. (However, it is possible to take copies of the containers in situations where it’s necessary to preserve value semantics.)
All types of dictionaries are simply pointers to the corresponding C++ containers which hold key-value associations where both keys and values may be arbitrary Pure expressions. The basic operations described below can be used to create, query and modify these objects. Comparisons of dictionaries are implemented as well, and the set-like operations let you combine dictionaries in different ways. These operations provide an interface similar to the usual Pure container API.
In addition, the stldict module provides some list-like operations on dictionaries, so that the member data can be processed and aggregated in a convenient fashion (including the ability to use dictionaries as generators in list and matrix comprehensions), and there’s also an interface to C++ iterators which enables you to traverse, inspect and modify the containers in a more C++-like way. Some low-level operations are available to access information about the underlying hash table of a hashed dictionary. Last but not least, the module also offers some operations to customize the pretty-printing of dictionary values.
When working with these data structures, please note the following special properties of this implementation:
copy
function if you need to preserve the original value.members
, keys
and vals
operations.x=>y
as a member of a set, since it always denotes a key-value association. As a remedy, you may use ordinary pairs (x,y)
instead.hashdict xs
, hashmdict xs
, orddict xs
, ordmdict xs
Create a dictionary of the corresponding type from a list, tuple or vector of its members. Members can be specified as hash pairs x=>y
to denote a key-value association. Any other kind of value denotes a singleton key without associated value. Note that the ordered dictionaries require that the keys be ordered, i.e., the <
predicate must be defined on them.
The same operations can also be used to construct a dictionary from another dictionary of any type. If the given dictionary is already of the corresponding type, this is a no-op (if you want to copy the dictionary instead, use the copy
function below). Otherwise the given dictionary is converted to a new dictionary of the desired target type.
mkhashdict y xs
, mkhashmdict y xs
, mkorddict y xs
, mkordmdict y xs
y
as the value for each key.
copy m
Create a new dictionary with the same type and content as m
. This is useful if you want to preserve value semantics when using destructive update operations such as insert
and delete
. In such a case, copy
can be used to take a copy of the dictionary beforehand, so that the original dictionary remains unmodified.
Note: This operation needs linear time with respect to the size of the dictionary (i.e., its number of members). If logarithmic update times are needed while still preserving value semantics, you should use the dictionary and set data structures from the standard library instead.
hashdictp m
, hashmdictp m
, orddictp m
, ordmdictp m
hashxdictp m
, ordxdictp m
, stldictp m
, stlmdictp m
stlxdictp m
# m
m ! x
x
in the dictionary m
. This may be x
itself if x
is a member of m
but has no associated value. In the case of a multidict this actually returns a list of values (which may be empty if m
doesn’t contain x
). Otherwise an out_of_bounds
exception is thrown if m
doesn’t contain x
.
null m
m
is empty, i.e., has zero members.
member m x
m
contains a member with key x
.
members m
, list m
m
. The member list will be in an apparently random order in the hashed dictionary case, while it is guaranteed to be in ascending order (by key) for ordered dictionaries. The same order is also used for the other inspection operations below.
stream m
list
, but the member list is returned as a lazy list (cf. Lazy Evaluation and Streams) whose members will be computed on the fly as the list is being traversed; cf. Iterators.
tuple m
, vector m
keys m
vals m
x
without associated value, x
itself is returned instead.
As already mentioned, the following modification operations are destructive, i.e., they actually modify the underlying dictionary data structure. If this is not desired, you’ll first have to take a copy of the target dictionary, see copy
.
insert m x
, insert m (x=>y)
, update m x y
x
or a key-value pair x=>y
into m
and return the modified dictionary. This always adds a new member in a multidict, otherwise it replaces an existing value if there is one. update
is provided as a fully curried version of insert
, so update m x y
behaves exactly like insert m (x=>y)
.
delete m x
, delete m (x=>y)
x
or the specific key-value pair x=>y
from m
(if present) and return the modified dictionary. In the multidict case, only the first member with the given key x
or key-value pair x=>y
is removed.
clear m
m
, making m
an empty dictionary. Returns ()
.
The usual comparison predicates (==
, ~=
, <=
, <
etc.) are defined on all dictionary types, where two dictionaries are considered “equal” (m1==m2
) if they both contain the same key=>value
pairs, and m1<=m2
means that m1
is a sub-dictionary of m2
, i.e., all key=>value
pairs of m1
are also contained in m2
(taking into account multiplicities in the multidict case). Ordered dictionaries compare keys using equality (assuming two keys a
and b
to be equal if neither a<b
nor b<a
holds), while hashed dictionaries check for syntactical equality (using ===
). The associated values are compared using the ==
predicate if it is defined, falling back to syntactic equality otherwise.
The module also defines syntactic equality on all dictionary types, so that two dictionaries of the same type are considered syntactically equal iff they contain the same (syntactically equal) members in the same order. This is always guaranteed if two dictionaries are “identical” (the same C++ pointer), but generally the member order will depend on how the dictionary was constructed. Thus if you need to check that two dictionaries contain the same members irrespective of the order in which the members are listed, the semantic equality operation ==
should be used instead; this will also handle the case of mixed operand types.
Note that if you really need to check whether two dictionaries are the same object rather than just syntactically equal, you’ll have to cast them to generic C pointers before comparing them with ===
. This can be done with the following little helper function:
same_dict x y = pointer_cast "void*" x === pointer_cast "void*" y;
These operations work with mixed operand types, promoting less general types to more general ones (i.e., ordered to hashed, and single-valued to multi-valued dictionaries). The result is always a new dictionary, leaving the operands unmodified.
m1 + m2
m1+m2
adds the members of m2
to m1
.
m1 - m2
m1-m2
removes the members of m2
from m1
.
m1 * m2
m1*m2
removes the members not in m2
from m1
.
The following operations are all overloaded so that they work like their list counterparts, treating their dictionary argument as if it was the member list of the dictionary:
do
, map
, catmap
, listmap
, rowmap
, rowcatmap
, colmap
, colcatmap
all
, any
, filter
, foldl
, foldl1
, foldr
, foldr1
, scanl
, scanl1
, scanr
, scanr1
, sort
Note that this includes the generic comprehension helpers listmap
, catmap
et al, so that dictionaries can be used as generators in list and matrix comprehensions as usual (see below for some examples).
These operations give direct access to C++ iterators on dictionaries which let you query the elements and do basic manipulations of the container. The operations are available in the stldict
namespace.
The iterator concept is somewhat alien to Pure and there are some pitfalls (most notably, destructive updates may render iterators invalid), but the operations described here are still useful in some situations, especially if you need to speed up sequential accesses to large containers or modify values stored in a container in a direct way. They are also used internally to compute lazy member lists of containers (stream
function).
You should only use these directly if you know what you are doing. In particular, make sure to consult the C++ standard library documentation for further details on C++ iterator usage.
The following operations are provided to create an iterator for a given dictionary.
stldict::begin m
, stldict::end m
stldict::end
must always be specified in qualified form since end
is a keyword in the Pure language.)
stldict::find m x
x
in the container and returns an iterator pointing to the corresponding member (or stldict::end m
if m
doesn’t contain x
).
Note that these operations return a new iterator object for each invocation. Also, the created iterator object keeps track of the container it belongs to, so that the container isn’t garbage-collected while the iterator is still being used. However, removing a member from the container (using either delete
or stldict::erase
) invalidates all iterators pointing to that member; the result of trying to access such an invalidated iterator is undefined (most likely your program will crash).
Similar caveats also apply to the stream
function which, as already mentioned, uses iterators internally to implement lazy list traversal of the members of a dictionary. Thus, if you delete a member of a dictionary while traversing it using stream
, you better make sure that this member is not the next stream element remaining to be visited; otherwise bad things will happen.
The following operations on iterators let you query and modify the contents of the underlying container:
stldict::dict i
i
belongs.
stldict::endp i
i
points to the end of the container (i.e., past the last element).
stldict::next i
i
unmodified. The operation fails if i
is already at the end of the container.
stldict::get i
i
(or just the key if there is no associated value). The operation fails if i
is at the end of the container.
stldict::put i y
i
to y
, and return the new value y
. The operation fails if i
is at the end of the container. Note that stldict::put
only allows you to set the associated value, not the key of the member.
stldict::erase i
i
(this invalidates i
and all other iterators pointing to this member). The operation fails if i
is at the end of the container.
i == j
, i ~= j
i == j
) if i
and j
point to the same element in the same container, and unequal (i ~= j
) if they don’t. (In contrast, note that iterators are in fact just pointers to a corresponding C++ data structure, and thus syntactical equality (i === j
) holds only if two iterators are the same object.)
The hashdict module also provides a few specialized low-level operations dealing with the layouts of buckets and the hash policy of the hashdict
and hashmdict
containers, such as bucket_count
, load_factor
, rehash
etc. These operations, which are all kept in their own separate hashdict
namespace, are useful to obtain performance-related information and modify the setup of the underlying hash table. Please check the hashdict.pure
module and the C++ standard library documentation for further details.
By default, dictionaries are pretty-printed in the format somedict xs
, where somedict
is the actual construction function such as hashdict
, orddict
, etc., and xs
is the member list of the dictionary. This is usually convenient, as the printed expression will evaluate to an equal container when reentered as Pure code. However, it is also possible to define your own custom pretty-printing with the following function.
hashdict_symbol f
, hashmdict_symbol f
, orddict_symbol f
, ordmdict_symbol f
f xs
(where xs
is the member list) for printing the corresponding type of dictionary.
Note that f
may also be an operator symbol (nonfix and unary symbols work best). In the case of an outfix symbol the list brackets around the members are removed; this makes it possible to render the container in a format similar to Pure’s list syntax. For instance:
> using stldict;
> outfix {$ $};
> orddict_symbol ({$ $});
()
> orddict (1..5);
{$1,2,3,4,5$}
See orddict_examp.pure
included in the distribution for a complete example which also discusses how to make such a custom print representation reparsable.
Some basic examples showing hashdict
in action:
> using stldict;
> let m = hashdict [foo=>99, bar=>bar 4711L, baz=>1..5]; m;
hashdict [foo=>99,bar=>bar 4711L,baz=>[1,2,3,4,5]]
> m!bar;
bar 4711L
> keys m;
[foo,bar,baz]
> vals m;
[99,bar 4711L,[1,2,3,4,5]]
> list m;
[foo=>99,bar=>bar 4711L,baz=>[1,2,3,4,5]]
> member m foo, member m bar;
1,1
Hashed multidicts (hashmdict
):
> let m = hashmdict [foo=>99,baz=>1..5,baz=>bar 4711L]; m;
hashmdict [foo=>99,baz=>[1,2,3,4,5],baz=>bar 4711L]
> m!baz;
[[1,2,3,4,5],bar 4711L]
> m!foo;
[99]
The following example illustrates how to employ ordered dictionaries (orddict
) as a set data structure:
> let m1 = orddict [5,1,3,11,3];
> let m2 = orddict (3..6);
> m1;m2;
orddict [1,3,5,11]
orddict [3,4,5,6]
> m1+m2;
orddict [1,3,4,5,6,11]
> m1-m2;
orddict [1,11]
> m1*m2;
orddict [3,5]
> m1*m2 <= m1, m1*m2 <= m2;
1,1
> m1 < m1+m2, m2 < m1+m2;
1,1
Of course, the same works with ordered multidicts (ordmdict
):
> let m1 = ordmdict [5,1,3,11,3];
> let m2 = ordmdict (3..6);
> m1;m2;
ordmdict [1,3,3,5,11]
ordmdict [3,4,5,6]
> m1+m2;
ordmdict [1,3,3,3,4,5,5,6,11]
> m1-m2;
ordmdict [1,3,11]
> m1*m2;
ordmdict [3,5]
> m1*m2 <= m1, m1*m2 <= m2;
1,1
> m1 < m1+m2, m2 < m1+m2;
1,1
In fact, the binary operations (comparisons as well as the set operations +
, -
and *
) work with any combination of dictionary operands:
> let m1 = hashdict (1..5);
> let m2 = ordmdict (3..7);
> m1+m2;
hashmdict [1,2,3,3,4,4,5,5,6,7]
Note that the operands are always promoted to the more general operand type, where hashed beats ordered and multi-valued beats single-valued dictionaries. If this is not what you want, you can also specify the desired conversions explicitly:
> m1+orddict m2;
hashdict [1,2,3,4,5,6,7]
> orddict m1+m2;
ordmdict [1,2,3,3,4,4,5,5,6,7]
Also note that the “set” operations not only work with proper sets, but also with general dictionaries:
> hashdict [i=>i+1|i=1..4]+hashdict [i=>i-1|i=3..5];
hashdict [1=>2,2=>3,3=>2,4=>3,5=>4]
All dictionary containers can be used as generators in list and matrix comprehensions:
> let m = hashmdict [foo=>99,baz=>1..5,baz=>bar 4711L];
> [x y | x=>y = m];
[foo 99,baz [1,2,3,4,5],baz (bar 4711L)]
> {{x;y} | x=>y = m};
{foo,baz,baz;99,[1,2,3,4,5],bar 4711L}
Note that in the current implementation this always computes the full member list of the dictionary as an intermediate value, which will need considerable extra memory in the case of large dictionaries. As a remedy, you can also use the stream
function to convert the dictionary to a lazy list instead. This will often be slower, but in the case of big dictionaries the tradeoff between memory usage and execution speed might be worth considering. For instance:
> let m = hashdict [foo i => i | i = 1..10000];
> stream m;
(foo 1512=>1512):#<thunk 0x7fa1718350a8>
> stats -m
> #list m;
10000
0.01s, 40001 cells
> #stream m;
10000
0.1s, 16 cells
> #[y | x=>y = m; gcd y 767~=1];
925
0.05s, 61853 cells
> #[y | x=>y = stream m; gcd y 767~=1];
925
0.15s, 10979 cells