Library Reference

First level variables

bcolz.__version__

The version of the bcolz package.

bcolz.dask_here

Whether the minimum version of dask has been detected.

bcolz.min_dask_version

The minimum version of dask needed (dask is optional).

bcolz.min_numexpr_version

The minimum version of numexpr needed (numexpr is optional).

bcolz.ncores

The number of cores detected.

bcolz.numexpr_here

Whether the minimum version of numexpr has been detected.

Top level classes

class bcolz.cparams(clevel=None, shuffle=None, cname=None, quantize=None)

Class to host parameters for compression and other filters.

Parameters:

clevel : int (0 <= clevel < 10)

The compression level.

shuffle : int

The shuffle filter to be activated. Allowed values are bcolz.NOSHUFFLE (0), bcolz.SHUFFLE (1) and bcolz.BITSHUFFLE (2). The default is bcolz.SHUFFLE.

cname : string (‘blosclz’, ‘lz4’, ‘lz4hc’, ‘snappy’, ‘zlib’, ‘zstd’)

Select the compressor to use inside Blosc.

quantize : int (number of significant digits)

Quantize data to improve (lossy) compression. Data is quantized using np.around(scale*data)/scale, where scale is 2**bits, and bits is determined from the quantize value. For example, if quantize=1, bits will be 4. 0 means that the quantization is disabled.

In case some of the parameters are not passed, they will be

set to a default (see `setdefaults()` method).

Attributes

clevel The compression level.
cname The compressor name.
quantize Quantize filter.
shuffle Shuffle filter.

Methods

setdefaults([clevel, shuffle, cname, quantize]) Change the defaults for compression params.
static setdefaults(clevel=None, shuffle=None, cname=None, quantize=None)

Change the defaults for compression params.

Parameters:

clevel : int (0 <= clevel < 10)

The compression level.

shuffle : int

The shuffle filter to be activated. Allowed values are bcolz.NOSHUFFLE (0), bcolz.SHUFFLE (1) and bcolz.BITSHUFFLE (2). The default is bcolz.SHUFFLE.

cname : string (‘blosclz’, ‘lz4’, ‘lz4hc’, ‘snappy’, ‘zlib’, ‘zstd’)

Select the compressor to use inside Blosc.

quantize : int (number of significant digits)

Quantize data to improve (lossy) compression. Data is quantized using np.around(scale*data)/scale, where scale is 2**bits, and bits is determined from the quantize value. For example, if quantize=1, bits will be 4. 0 means that the quantization is disabled.

If this method is not called, the defaults will be set as in

defaults.py:

(``{clevel=5, shuffle=bcolz.SHUFFLE, cname=’lz4’, quantize=None}``).

class bcolz.attrs.attrs(rootdir, mode, _new=False)

Accessor for attributes in carray/ctable objects.

This class behaves very similarly to a dictionary, and attributes can be appended in the typical way:

attrs['myattr'] = value

And can be retrieved similarly:

value = attrs['myattr']

Attributes can be removed with:

del attrs['myattr']

This class also honors the __iter__ and __len__ special functions. Moreover, a getall() method returns all the attributes as a dictionary.

CAVEAT: The values should be able to be serialized with JSON for persistence.

Methods

getall

Also, see the carray and ctable classes below.

Top level functions

bcolz.arange([start, ]stop, [step, ]dtype=None, **kwargs)

Return evenly spaced values within a given interval.

Values are generated within the half-open interval [start, stop) (in other words, the interval including start but excluding stop). For integer arguments the function is equivalent to the Python built-in range function, but returns a carray rather than a list.

Parameters:

start : number, optional

Start of interval. The interval includes this value. The default start value is 0.

stop : number

End of interval. The interval does not include this value.

step : number, optional

Spacing between values. For any output out, this is the distance between two adjacent values, out[i+1] - out[i]. The default step size is 1. If step is specified, start must also be given.

dtype : dtype

The type of the output array. If dtype is not given, infer the data type from the other input arguments.

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : carray

Bcolz object made of evenly spaced values.

For floating point arguments, the length of the result is ceil((stop - start)/step). Because of floating point overflow, this rule may result in the last element of out being greater than stop.

bcolz.eval(expression, vm=None, out_flavor=None, user_dict=None, blen=None, **kwargs)

Evaluate an expression and return the result.

Parameters:

expression : string

A string forming an expression, like ‘2*a+3*b’. The values for ‘a’ and ‘b’ are variable names to be taken from the calling function’s frame. These variables may be scalars, carrays or NumPy arrays.

vm : string

The virtual machine to be used in computations. It can be ‘numexpr’, ‘python’ or ‘dask’. The default is to use ‘numexpr’ if it is installed.

out_flavor : string

The flavor for the out object. It can be ‘bcolz’ or ‘numpy’. If None, the value is get from bcolz.defaults.out_flavor.

user_dict : dict

An user-provided dictionary where the variables in expression can be found by name.

blen : int

The length of the block to be evaluated in one go internally. The default is a value that has been tested experimentally and that offers a good enough peformance / memory usage balance.

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : bcolz or numpy object

The outcome of the expression. In case out_flavor=’bcolz’, you can adjust the properties of this object by passing any additional arguments supported by the carray constructor in kwargs.

bcolz.fill(shape, dtype=float, dflt=None, **kwargs)

Return a new carray or ctable object of given shape and type, filled with dflt.

Parameters:

shape : int

Shape of the new array, e.g., (2,3).

dflt : Python or NumPy scalar

The value to be used during the filling process. If None, values are filled with zeros. Also, the resulting carray will have this value as its dflt value.

dtype : data-type, optional

The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : carray or ctable

Bcolz object filled with dflt values with the given shape and dtype.

See also

ones, zeros

bcolz.fromiter(iterable, dtype, count, **kwargs)

Create a carray/ctable from an iterable object.

Parameters:

iterable : iterable object

An iterable object providing data for the carray.

dtype : numpy.dtype instance

Specifies the type of the outcome object.

count : int

The number of items to read from iterable. If set to -1, means that the iterable will be used until exhaustion (not recommended, see note below).

kwargs : list of parameters or dictionary

Any parameter supported by the carray/ctable constructors.

Returns:

out : a carray/ctable object

Notes

Please specify count to both improve performance and to save memory. It allows fromiter to avoid looping the iterable twice (which is slooow). It avoids memory leaks to happen too (which can be important for large iterables).

bcolz.iterblocks(cobj, blen=None, start=0, stop=None)

Iterate over a cobj (carray/ctable) in blocks of size blen.

Parameters:

cobj : carray/ctable object

The bcolz object to be iterated over.

blen : int

The length of the block that is returned. The default is the chunklen, or for a ctable, the minimum of the different column chunklens.

start : int

Where the iterator starts. The default is to start at the beginning.

stop : int

Where the iterator stops. The default is to stop at the end.

Returns:

out : iterable

This iterable returns data blocks as NumPy arrays of homogeneous or structured types, depending on whether cobj is a carray or a ctable object.

See also

whereblocks

bcolz.ones(shape, dtype=float, **kwargs)

Return a new carray object of given shape and type, filled with ones.

Parameters:

shape : int

Shape of the new array, e.g., (2,3).

dtype : data-type, optional

The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : carray or ctable

Bcolz object of ones with the given shape and dtype.

See also

fill, zeros

bcolz.zeros(shape, dtype=float, **kwargs)

Return a new carray object of given shape and type, filled with zeros.

Parameters:

shape : int

Shape of the new array, e.g., (2,3).

dtype : data-type, optional

The desired data-type for the array, e.g., numpy.int8. Default is numpy.float64.

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : carray or ctable

Bcolz object of zeros with the given shape and dtype.

See also

fill, ones

bcolz.open(rootdir, mode='a')

Open a disk-based carray/ctable.

Parameters:

rootdir : pathname (string)

The directory hosting the carray/ctable object.

mode : the open mode (string)

Specifies the mode in which the object is opened. The supported values are:

  • ‘r’ for read-only
  • ‘w’ for emptying the previous underlying data
  • ‘a’ for allowing read/write on top of existing data
Returns:

out : a carray/ctable object or IOError (if not objects are found)

bcolz.walk(dir, classname=None, mode='a')

Recursively iterate over carray/ctable objects hanging from dir.

Parameters:

dir : string

The directory from which the listing starts.

classname : string

If specified, only object of this class are returned. The values supported are ‘carray’ and ‘ctable’.

mode : string

The mode in which the object should be opened.

Returns:

out : iterator

Iterator over the objects found.

Top level printing functions

bcolz.array2string(a, max_line_width=None, precision=None, suppress_small=None, separator=' ', prefix="", style=repr, formatter=None)

Return a string representation of a carray/ctable object.

This is the same function than in NumPy. Please refer to NumPy documentation for more info.

See Also:
set_printoptions(), get_printoptions()
bcolz.get_printoptions()

Return the current print options.

This is the same function than in NumPy. For more info, please refer to the NumPy documentation.

See Also:
array2string(), set_printoptions()
bcolz.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None, nanstr=None, infstr=None, formatter=None)

Set printing options.

These options determine the way floating point numbers in carray objects are displayed. This is the same function than in NumPy. For more info, please refer to the NumPy documentation.

See Also:
array2string(), get_printoptions()

Utility functions

bcolz.set_nthreads(nthreads)

Sets the number of threads to be used during bcolz operation.

This affects to both Blosc and Numexpr (if available). If you want to change this number only for Blosc, use blosc_set_nthreads instead.

Parameters:

nthreads : int

The number of threads to be used during bcolz operation.

Returns:

out : int

The previous setting for the number of threads.

bcolz.blosc_set_nthreads(nthreads)

Sets the number of threads that Blosc can use.

Parameters:

nthreads : int

The desired number of threads to use.

Returns:

out : int

The previous setting for the number of threads.

bcolz.detect_number_of_cores()

Return the number of cores in this system.

bcolz.blosc_version()

Return the version of the Blosc library.

bcolz.print_versions()

Print all the versions of packages that bcolz relies on.

bcolz.test(verbose=False, heavy=False)

Run all the tests in the test suite.

If verbose is set, the test suite will emit messages with full verbosity (not recommended unless you are looking into a certain problem).

If heavy is set, the test suite will be run in heavy mode (you should be careful with this because it can take a lot of time and resources from your computer).

The carray class

class bcolz.carray

A compressed and enlargeable data container either in-memory or on-disk.

carray exposes a series of methods for dealing with the compressed container in a NumPy-like way.

Parameters:

array : a NumPy-like object

This is taken as the input to create the carray. It can be any Python object that can be converted into a NumPy object. The data type of the resulting carray will be the same as this NumPy object.

cparams : instance of the cparams class, optional

Parameters to the internal Blosc compressor.

dtype : NumPy dtype

Force this dtype for the carray (rather than the array one).

dflt : Python or NumPy scalar

The value to be used when enlarging the carray. If None, the default is filling with zeros.

expectedlen : int, optional

A guess on the expected length of this object. This will serve to decide the best chunklen used for compression and memory I/O purposes.

chunklen : int, optional

The number of items that fits into a chunk. By specifying it you can explicitly set the chunk size used for compression and memory I/O. Only use it if you know what are you doing.

rootdir : str, optional

The directory where all the data and metadata will be stored. If specified, then the carray object will be disk-based (i.e. all chunks will live on-disk, not in memory) and persistent (i.e. it can be restored in other session, e.g. via the open() top-level function).

safe : bool (defaults to True)

Coerces inputs to array types. Set to false if you always give correctly typed, strided, and shaped arrays and if you never use Object dtype.

mode : str, optional

The mode that a persistent carray should be created/opened. The values can be:

  • ‘r’ for read-only
  • ‘w’ for read/write. During carray creation, the rootdir will be removed if it exists. During carray opening, the carray will be resized to 0.
  • ‘a’ for append (possible data inside rootdir will not be removed).

Attributes

atomsize atomsize: ‘int’
attrs The attribute accessor.
cbytes The compressed size of this object (in bytes).
chunklen The chunklen of this object (in rows).
chunks chunks: object
cparams The compression parameters for this object.
dflt The default value of this object.
dtype The dtype of this object.
itemsize itemsize: ‘int’
leftover_array Array containing the leftovers chunk (uncompressed chunk)
leftover_bytes Number of bytes in the leftover_array
leftover_elements Number of elements in the leftover_array
leftover_ptr Pointer referring to the leftover_array
len The length (leading dimension) of this object.
mode The mode used to create/open the mode.
nbytes The original (uncompressed) size of this object (in bytes).
nchunks Number of chunks in the carray
ndim The number of dimensions of this object.
nleftover The number of leftover elements.
partitions List of tuples indicating the bounds for each chunk
rootdir The on-disk directory used for persistency.
safe Whether or not to perform type/shape checks on every operation.
shape The shape of this object.
size The size of this object.

Methods

append(self, array) Append a numpy array to this instance.
copy(self, **kwargs) Return a copy of this object.
flush(self) Flush data in internal buffers to disk.
free_cachemem(self) Release in-memory cached chunk
iter(self[, start, stop, step, limit, skip, ...]) Iterator with start, stop and step bounds.
next
purge(self) Remove the underlying data for on-disk arrays.
reshape(self, newshape) Returns a new carray containing the same data with a new shape.
resize(self, nitems) Resize the instance to have nitems.
sum(self[, dtype]) Return the sum of the array elements.
trim(self, nitems) Remove the trailing nitems from this instance.
view(self) Create a light weight view of the data in the original carray.
where(self, boolarr[, limit, skip]) Iterator that returns values of this object where boolarr is true.
wheretrue(self[, limit, skip]) Iterator that returns indices where this object is true.
__getitem__

x.__getitem__(key) <==> x[key]

Returns values based on key. All the functionality of ndarray.__getitem__() is supported (including fancy indexing), plus a special support for expressions:

Parameters:

key : string

It will be interpret as a boolean expression (computed via eval) and the elements where these values are true will be returned as a NumPy array.

See also

eval

__setitem__

x.__setitem__(key, value) <==> x[key] = value

Sets values based on key. All the functionality of ndarray.__setitem__() is supported (including fancy indexing), plus a special support for expressions:

Parameters:

key : string

It will be interpret as a boolean expression (computed via eval) and the elements where these values are true will be set to value.

See also

eval

append(self, array)

Append a numpy array to this instance.

Parameters:

array : NumPy-like object

The array to be appended. Must be compatible with shape and type of the carray.

atomsize

atomsize: ‘int’

attrs

The attribute accessor.

See also

attrs.attrs

cbytes

The compressed size of this object (in bytes).

chunklen

The chunklen of this object (in rows).

chunks

chunks: object

copy(self, **kwargs)

Return a copy of this object.

Parameters:

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : carray object

The copy of this object.

cparams

The compression parameters for this object.

dflt

The default value of this object.

dtype

The dtype of this object.

flush(self)

Flush data in internal buffers to disk.

This call should typically be done after performing modifications (__settitem__(), append()) in persistence mode. If you don’t do this, you risk losing part of your modifications.

free_cachemem(self)

Release in-memory cached chunk

itemsize

itemsize: ‘int’

iter(self, start=0, stop=None, step=1, limit=None, skip=0, _next=False)

Iterator with start, stop and step bounds.

Parameters:

start : int

The starting item.

stop : int

The item after which the iterator stops.

step : int

The number of items incremented during each iteration. Cannot be negative.

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

Returns:

out : iterator

See also

where, wheretrue

leftover_array

Array containing the leftovers chunk (uncompressed chunk)

leftover_bytes

Number of bytes in the leftover_array

leftover_elements

Number of elements in the leftover_array

leftover_ptr

Pointer referring to the leftover_array

len

The length (leading dimension) of this object.

mode

The mode used to create/open the mode.

nbytes

The original (uncompressed) size of this object (in bytes).

nchunks

Number of chunks in the carray

ndim

The number of dimensions of this object.

next
nleftover

The number of leftover elements.

partitions

List of tuples indicating the bounds for each chunk

purge(self)

Remove the underlying data for on-disk arrays.

reshape(self, newshape)

Returns a new carray containing the same data with a new shape.

Parameters:

newshape : int or tuple of ints

The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.

Returns:

reshaped_array : carray

A copy of the original carray.

resize(self, nitems)

Resize the instance to have nitems.

Parameters:

nitems : int

The final length of the object. If nitems is larger than the actual length, new items will appended using self.dflt as filling values.

rootdir

The on-disk directory used for persistency.

safe

Whether or not to perform type/shape checks on every operation.

shape

The shape of this object.

size

The size of this object.

sum(self, dtype=None)

Return the sum of the array elements.

Parameters:

dtype : NumPy dtype

The desired type of the output. If None, the dtype of self is used. An exception is when self has an integer type with less precision than the default platform integer. In that case, the default platform integer is used instead (NumPy convention).

Returns:

out : NumPy scalar with dtype

trim(self, nitems)

Remove the trailing nitems from this instance.

Parameters:

nitems : int

The number of trailing items to be trimmed. If negative, the object is enlarged instead.

view(self)

Create a light weight view of the data in the original carray.

Returns:

out : carray object

The view of this object.

See also

copy

where(self, boolarr, limit=None, skip=0)

Iterator that returns values of this object where boolarr is true.

This is currently only useful for boolean carrays that are unidimensional.

Parameters:

boolarr : a carray or NumPy array of boolean type

The boolean values.

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

Returns:

out : iterator

See also

iter, wheretrue

wheretrue(self, limit=None, skip=0)

Iterator that returns indices where this object is true.

This is currently only useful for boolean carrays that are unidimensional.

Parameters:

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

Returns:

out : iterator

See also

iter, where

The ctable class

class bcolz.ctable.ctable(columns=None, names=None, **kwargs)

This class represents a compressed, column-wise table.

Create a new ctable from cols with optional names.

Parameters:

columns : tuple or list of column objects

The list of column data to build the ctable object. These are typically carrays, but can also be a list of NumPy arrays or a pure NumPy structured array. A list of lists or tuples is valid too, as long as they can be converted into carray objects.

names : list of strings or string

The list of names for the columns. The names in this list must be valid Python identifiers, must not start with an underscore, and has to be specified in the same order as the cols. If not passed, the names will be chosen as ‘f0’ for the first column, ‘f1’ for the second and so on so forth (NumPy convention).

kwargs : list of parameters or dictionary

Allows to pass additional arguments supported by carray constructors in case new carrays need to be built.

Notes

Columns passed as carrays are not be copied, so their settings will stay the same, even if you pass additional arguments (cparams, chunklen...).

Attributes

cbytes The compressed size of this object (in bytes).
cparams The compression parameters for this object.
dtype The data type of this object (numpy dtype).
names The column names of the object (list).
nbytes The original (uncompressed) size of this object (in bytes).
ndim The number of dimensions of this object.
shape The shape of this object.
size The size of this object.

Methods

addcol(newcol[, name, pos, move]) Add a new newcol object as column.
append(cols) Append cols to this ctable.
copy(**kwargs) Return a copy of this ctable.
delcol([name, pos, keep]) Remove the column named name or in position pos.
eval(expression, **kwargs) Evaluate the expression on columns and return the result.
fetchwhere(expression[, outcols, limit, ...]) Fetch the rows fulfilling the expression condition.
flush() Flush data in internal buffers to disk.
free_cachemem() Get rid of internal caches to free memory.
fromdataframe(df, **kwargs) Return a ctable object out of a pandas dataframe.
fromhdf5(filepath[, nodepath]) Return a ctable object out of a compound HDF5 dataset (PyTables Table).
iter([start, stop, step, outcols, limit, ...]) Iterator with start, stop and step bounds.
resize(nitems) Resize the instance to have nitems.
todataframe([columns, orient]) Return a pandas dataframe out of this object.
tohdf5(filepath[, nodepath, mode, cparams, ...]) Write this object into an HDF5 file.
trim(nitems) Remove the trailing nitems from this instance.
where(expression[, outcols, limit, skip, ...]) Iterate over rows where expression is true.
whereblocks(expression[, blen, outcols, ...]) Iterate over the rows that fullfill the expression condition on this ctable, in blocks of size blen.
addcol(newcol, name=None, pos=None, move=False, **kwargs)

Add a new newcol object as column.

Parameters:

newcol : carray, ndarray, list or tuple

If a carray is passed, no conversion will be carried out. If conversion to a carray has to be done, kwargs will apply.

name : string, optional

The name for the new column. If not passed, it will receive an automatic name.

pos : int, optional

The column position. If not passed, it will be appended at the end.

move: boolean, optional

If the new column is an existing, disk-based carray should it a) copy the data directory (False) or b) move the data directory (True)

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

See also

delcol

Notes

You should not specificy both name and pos arguments, unless they are compatible.

append(cols)

Append cols to this ctable.

Parameters:

cols : list/tuple of scalar values, NumPy arrays or carrays

It also can be a NumPy record, a NumPy recarray, or another ctable.

cbytes

The compressed size of this object (in bytes).

cols = None

The ctable columns accessor.

copy(**kwargs)

Return a copy of this ctable.

Parameters:

kwargs : list of parameters or dictionary

Any parameter supported by the carray/ctable constructor.

Returns:

out : ctable object

The copy of this ctable.

cparams

The compression parameters for this object.

delcol(name=None, pos=None, keep=False)

Remove the column named name or in position pos.

Parameters:

name: string, optional

The name of the column to remove.

pos: int, optional

The position of the column to remove.

keep: boolean

For disk-backed columns: keep the data on disk?

See also

addcol

Notes

You must specify at least a name or a pos. You should not specify both name and pos arguments, unless they are compatible.

dtype

The data type of this object (numpy dtype).

eval(expression, **kwargs)

Evaluate the expression on columns and return the result.

Parameters:

expression : string

A string forming an expression, like ‘2*a+3*b’. The values for ‘a’ and ‘b’ are variable names to be taken from the calling function’s frame. These variables may be column names in this table, scalars, carrays or NumPy arrays.

kwargs : list of parameters or dictionary

Any parameter supported by the eval() top level function.

Returns:

out : bcolz object

The outcome of the expression. You can tailor the properties of this object by passing additional arguments supported by the carray constructor in kwargs.

See also

eval

fetchwhere(expression, outcols=None, limit=None, skip=0, out_flavor=None, user_dict={}, vm=None, **kwargs)

Fetch the rows fulfilling the expression condition.

Parameters:

expression : string or carray

A boolean Numexpr expression or a boolean carray.

outcols : list of strings or string

The list of column names that you want to get back in results. Alternatively, it can be specified as a string such as ‘f0 f1’ or ‘f0, f1’. If None, all the columns are returned. If the special name ‘nrow__‘ is present, the number of row will be included in output.

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

out_flavor : string

The flavor for the out object. It can be ‘bcolz’ or ‘numpy’. If None, the value is get from bcolz.defaults.out_flavor.

user_dict : dict

An user-provided dictionary where the variables in expression can be found by name.

vm : string

The virtual machine to be used in computations. It can be ‘numexpr’, ‘python’ or ‘dask’. The default is to use ‘numexpr’ if it is installed.

kwargs : list of parameters or dictionary

Any parameter supported by the carray constructor.

Returns:

out : bcolz or numpy object

The outcome of the expression. In case out_flavor=’bcolz’, you can adjust the properties of this object by passing any additional arguments supported by the carray constructor in kwargs.

See also

whereblocks

flush()

Flush data in internal buffers to disk.

This call should typically be done after performing modifications (__settitem__(), append()) in persistence mode. If you don’t do this, you risk losing part of your modifications.

free_cachemem()

Get rid of internal caches to free memory.

This call can typically be made after reading from a carray/ctable so as to free the memory used internally to cache data blocks/chunks.

static fromdataframe(df, **kwargs)

Return a ctable object out of a pandas dataframe.

Parameters:

df : DataFrame

A pandas dataframe.

kwargs : list of parameters or dictionary

Any parameter supported by the ctable constructor.

Returns:

out : ctable object

A ctable filled with values from df.

Notes

The ‘object’ dtype will be converted into a ‘S’tring type, if possible. This allows for much better storage savings in bcolz.

static fromhdf5(filepath, nodepath='/ctable', **kwargs)

Return a ctable object out of a compound HDF5 dataset (PyTables Table).

Parameters:

filepath : string

The path of the HDF5 file.

nodepath : string

The path of the node inside the HDF5 file.

kwargs : list of parameters or dictionary

Any parameter supported by the ctable constructor.

Returns:

out : ctable object

A ctable filled with values from the HDF5 node.

See also

ctable.tohdf5

iter(start=0, stop=None, step=1, outcols=None, limit=None, skip=0, out_flavor=<function namedtuple>)

Iterator with start, stop and step bounds.

Parameters:

start : int

The starting item.

stop : int

The item after which the iterator stops.

step : int

The number of items incremented during each iteration. Cannot be negative.

outcols : list of strings or string

The list of column names that you want to get back in results. Alternatively, it can be specified as a string such as ‘f0 f1’ or ‘f0, f1’. If None, all the columns are returned. If the special name ‘nrow__‘ is present, the number of row will be included in output.

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

out_flavor : namedtuple, tuple or ndarray

Whether the returned rows are namedtuples or tuples. Default are named tuples.

Returns:

out : iterable

See also

where

names

The column names of the object (list).

nbytes

The original (uncompressed) size of this object (in bytes).

ndim

The number of dimensions of this object.

resize(nitems)

Resize the instance to have nitems.

Parameters:

nitems : int

The final length of the instance. If nitems is larger than the actual length, new items will appended using self.dflt as filling values.

shape

The shape of this object.

size

The size of this object.

todataframe(columns=None, orient='columns')

Return a pandas dataframe out of this object.

Parameters:

columns : sequence of column labels, optional

Must be passed if orient=’index’.

orient : {‘columns’, ‘index’}, default ‘columns’

The “orientation” of the data. If the keys of the input correspond to column labels, pass ‘columns’ (default). Otherwise if the keys correspond to the index, pass ‘index’.

Returns:

out : DataFrame

A pandas DataFrame filled with values from this object.

tohdf5(filepath, nodepath='/ctable', mode='w', cparams=None, cname=None)

Write this object into an HDF5 file.

Parameters:

filepath : string

The path of the HDF5 file.

nodepath : string

The path of the node inside the HDF5 file.

mode : string

The mode to open the PyTables file. Default is ‘w’rite mode.

cparams : cparams object

The compression parameters. The defaults are the same than for the current bcolz environment.

cname : string

Any of the compressors supported by PyTables (e.g. ‘zlib’). The default is to use ‘blosc’ as meta-compressor in combination with one of its compressors (see cparams parameter above).

See also

ctable.fromhdf5

trim(nitems)

Remove the trailing nitems from this instance.

Parameters:

nitems : int

The number of trailing items to be trimmed.

where(expression, outcols=None, limit=None, skip=0, out_flavor=<function namedtuple>, user_dict={}, vm=None)

Iterate over rows where expression is true.

Parameters:

expression : string or carray

A boolean Numexpr expression or a boolean carray.

outcols : list of strings or string

The list of column names that you want to get back in results. Alternatively, it can be specified as a string such as ‘f0 f1’ or ‘f0, f1’. If None, all the columns are returned. If the special name ‘nrow__‘ is present, the number of row will be included in output.

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

out_flavor : namedtuple, tuple or ndarray

Whether the returned rows are namedtuples or tuples. Default are named tuples.

user_dict : dict

An user-provided dictionary where the variables in expression can be found by name.

vm : string

The virtual machine to be used in computations. It can be ‘numexpr’, ‘python’ or ‘dask’. The default is to use ‘numexpr’ if it is installed.

Returns:

out : iterable

See also

iter

whereblocks(expression, blen=None, outcols=None, limit=None, skip=0, user_dict={}, vm=None)

Iterate over the rows that fullfill the expression condition on this ctable, in blocks of size blen.

Parameters:

expression : string or carray

A boolean Numexpr expression or a boolean carray.

blen : int

The length of the block that is returned. The default is the chunklen, or for a ctable, the minimum of the different column chunklens.

outcols : list of strings or string

The list of column names that you want to get back in results. Alternatively, it can be specified as a string such as ‘f0 f1’ or ‘f0, f1’. If None, all the columns are returned. If the special name ‘nrow__‘ is present, the number of row will be included in output.

limit : int

A maximum number of elements to return. The default is return everything.

skip : int

An initial number of elements to skip. The default is 0.

user_dict : dict

An user-provided dictionary where the variables in expression can be found by name.

vm : string

The virtual machine to be used in computations. It can be ‘numexpr’, ‘python’ or ‘dask’. The default is to use ‘numexpr’ if it is installed.

Returns:

out : iterable

The iterable returns numpy objects of blen length.

See also

See
py:func:<bcolz.toplevel.iterblocks> in toplevel functions.