DRIVE FAQ

DRIVE FAQ#

This section addresses frequently asked questions and provides solutions to common problems identified by users.

What are the main differences between DRIVE v1 and DRIVE v3?

There are a handful of differences and improvements added to DRIVE v3 that make it distinct from version 1.0:

Addition of phenotype enrichment test

The original implementation of DRIVE only performed network clustering. DRIVE v3 added a phenotypic enrichment test that users can enable by providing a case/control file. This enrichment test uses binomial statistics to determine if a network is enriched for cases compared to the total cohort. This test will be performed for all networks that have 2 or more cases. Users can customize this test with their own code using the plugin architecture of DRIVE. This new test is also generalized so that users can provide a file with case/control definitions for multiple phecodes. This generalization allows users to run a PheWES (Phenome-wide Enrichment Study) using a phenotype file format similar to what is required by many PheWAS tools.
Extensibility through plugins

DRIVE v3 is designed to interface with existing analytical pipelines through a flexible and extensible backend. This backend relies on the plugin architecture described in more detail here: Plugin Description. Users can create their own “plugins” to perform additional analyses or output data in a more convenient format. This flexibility allows users to adjust DRIVE to their use cases without having to wait for formal updates to DRIVE from the Below Lab. You can click on this link to read more information about the way DRIVE stores the network data in the Data API or to view an example of a valid plugin: plugin template
Performance increases

In designing DRIVE v3, we took advantage of features of common data science libraries such as Pandas, PyArrow, and DuckDB to boost performance. Current profiling shows a ~35x improvement when running only the clustering algorithm over the CFTR locus in pairwise IBD segments for 250,000 individuals. The performance increase resulted in moving the IBD segment I/O and filtering to DuckDB. In v1, DRIVE read this data in 1 line at a time and appended the new line to a (growing) pandas dataframe. This process resulted in many memory allocations, which became slower as the dataframe grew in size. DuckDB enables DRIVE to use multiple threads to read in and filter the file. With the inclusion of DuckDB, this step of DRIVE is now multi-threaded but the rest of the runtime which relies on pandas, iGraph, and scipy is still single threaded.

DRIVE v1 performance compared to DRIVE v3#

DRIVE version

Runtime (Wall Clock)

Runtime (User Time)

Memory Consumption

v1

36h 16m

20h 25m

3.1 Gb

v3

1h 3m

1h 11m

7.1 Gb
Improved logging and error handling

DRIVE v1 did not utilize any logging and often let the program tactlessly crash when it encountered errors. Now DRIVE has more robust error handling and logging functionality that the user can customize through a verbosity flag “-v”. There are almost certainly still ways to get the program to crash, but we have attempted to cover many of the errors commonly encountered in development. If you encounter new errors that you think are worth handling please let us know by submitting a GitHub issue so we can reproduce the error and then determine the best way to implement error handling.
Incorporation of the ability to generate dendrograms into the DRIVE codebase

In the original publication using DRIVE v1, the dendrogram of a network of interest was visualized using the phylogenetic tree generator ATGC: FastME. This approach required the user to rely on a second software tool not maintained by the Below Lab. For DRIVE v3, we implemented our own dendrogram generation using scipy and packaged it in a DRIVE subcommand called dendrogram. This approach allows us to ensure that the dendrogram functionality stays consistent and is optimized to work with the DRIVE output without requiring the user to perform a lot of post-processing. In the original publication using DRIVE v1, the dendrogram of a network of interest was visualized using the phylogenetic tree generator ATGC: FastME. This approach required the user to rely on a second software tool not maintained by the Below Lab. For DRIVE v3, we implemented our own dendrogram generation using scipy and packaged it in a DRIVE subcommand called dendrogram. This approach allows us to ensure that the dendrogram functionality stays consistent and is optimized to work with the DRIVE output without requiring the user to perform a lot of post-processing.

DRIVE v1 performance compared to DRIVE v3#
DRIVE version	Runtime (Wall Clock)	Runtime (User Time)	Memory Consumption
v1	36h 16m	20h 25m	3.1 Gb
v3	1h 3m	1h 11m	7.1 Gb