Structure of the Data container

Structure of the Data container#

There is a specific class that the clustering module of DRIVE creates. This class is called “Data” and is passed to each plugin (see code block below). This class has no methods. The following section will describe the class attributes so the user understands what information each plugin can access by default.

@dataclass
class Data:
    """main class to hold the data from the network analysis and the different pvalues"""

    networks: List[Network_Interface]
    output_path: Path
    carriers: Dict[str, Dict[str, List[str]]]
    phenotype_descriptions: Dict[str, Dict[str, str]]

Changing the Data class attributes

The attributes described in this section are what DRIVE comes with by default. Users can add additional attributes through custom plugins. The advantages/disadvantages of customizing the class attributes will be discussed in further detail in a future section on creating custom plugins.

Data class attributes:#

  • networks: This is a list of all the networks identified by the DRIVE clustering analysis. This list consists of Network objects that are shown by the code block below. These objects have information about the individuals in the cluster, the haplotypes in the cluster, and the size of the cluster. The Network class has attributes for pvalues, and min_pvalue_str but these values are empty until they are calculated by the “pvalues” plugin.

@dataclass
class Network:
    clst_id: float
    true_positive_count: int
    true_positive_percent: float
    false_negative_edges: List[int]
    false_negative_count: int
    members: Union[Set[int], Set[str]]
    haplotypes: Union[List[int], List[str]]
    min_pvalue_str: str = ""
    pvalues: Dict[str, Dict[str, Any]] = field(default_factory=dict)

    def print_members_list(self) -> str:
        """Returns a string that has all of the members ids separated by space

        Returns
        -------
        str
            returns a string where the members list attribute
            is formatted as a string for the output file. Individuals strings are joined by comma.
        """
        return ", ".join(list(map(str, self.members)))

    def __lt__(self, comp_class: T) -> bool:
        """Override the less than method so that objects can be sorted in
        ascending numeric order based on cluster id.

        Parameters
        ----------
        comp_class : Network
            Network object to compare.

        Returns
        -------
        bool
            returns true if the self cluster id is less than the
            comp_class cluster id.
        """

        return self.clst_id < comp_class.clst_id
  • output: This attribute is a path object that describes where to write the drive output to. This path will have a file prefix that drive appends a suffix to when it writes to a file. If you wish to get the parent directory you can just use network_obj.output.parent.

  • carriers: This attribute is a dictionary of dictionaries that tells who cases, controls, and exclusions are for each phenotype. The outer key is the phenotype id. The inner dictionary has 3 keys: “cases”, “controls”, “excluded”. The values for each of these keys are a list that contains the individual ids for each case, control, or excluded individual, respectively.

  • phenotype_descriptions: This attribute is another dictionary of dictionaries that has a description of each phenotype. The outer key is the phenotype id. The inner key is the string phenotype and has the phenotype description as a value.