Welcome to BitClust’s documentation¶
Description¶
BitClust is a Python command-line interface (CLI) conceived for fast clustering of relatively long Molecular Dynamics trajectories following the Daura’s algorithm [1]. Retrieved clusters are roughly equivalent to those reported by VMD’s internal command measure cluster but they are computed in a much faster way (see benchmark section for more details).
What BitClust offers is a classical tradeoff; RAM for speed. It can calculate all pairwise distances between frames to run a clustering job and then store them in memory instead of recalculating them whenever a cluster is found.
It is worth noting that memory resources have been deeply optimized by encoding similarity distances as bits (0 if the distance is less equal than a specified threshold, 1 otherwise). This encoding result in a storage reduction of at least 32X/64X when compared to similar algorithms that save the same information as single/double-precision float values.
Main Dependencies¶
BitClust is built on the shoulders of two giants:
- MDTraj software that allows a very fast calculation of RMSD pairwise distances between all frames of trajectories in a parallelized fashion and
- bitarray third-party python library which offers a memory-efficient data structure of bit-vectors (bit arrays) and a set of bitwise operations that are the very heart of our clustering implementation.
Citation¶
If you make use of BitClust in your scientific work, BitCool and cite it ;)