Code

I contribute to several open-source projects hosted on GitHub.

  • INDRA (the Integrated Network and Dynamical Reasoning Assembler) assembles information about biochemical mechanisms into a common format that can be used to build several different kinds of explanatory models. Sources of mechanistic information include pathway databases, natural language descriptions of mechanisms by human curators, and findings extracted from the literature by text mining. Mechanistic information from multiple sources is de-duplicated, standardized and assembled into sets of mechanistic Statements with associated evidence. Sets of Statements can then be used to assemble both executable rule-based models (using PySB) and a variety of different types of network models.
  • FamPlex. A resource for improving named entity recognition, grounding, and relationship resolution in biomedical text mining. The development of FamPlex grew out of our experience using existing software for extracting molecular mechanisms from the scientific literature. We found that named entities extracted in text (e.g. genes and proteins) frequently lacked associated identifiers or were linked to incorrect identifiers. These errors of entity linking (or “grounding”) were particularly pernicious in the case of protein families and protein complexes. To address this, we created FamPlex, which consists of an independent set of identifiers for protein families and complexes along with links to (i) lexical synonyms consistent with the common appearance of terms in text (e.g., “Akt”), (ii) constituent genes and proteins (e.g., for Akt the genes AKT1, AKT2, and AKT3) and (iii) identifiers in other curated resources (e.g., Reactome ID R-HAS-202074). We also curated 137 biology-specific prefixes and suffixes associated with gene/protein names such as “FLAG-AKT1” or “pAKT1” that are often incorrectly handled by NLP. See blog post here.
  • PySB. A rule-based language for modeling biochemical pathways embedded within Python. Facilitates the creation of transparent, reusable, and composable models. Built on top of the rule-based languages BNGL and Kappa and integrated with Numpy/Scipy/Matplotlib (see Publications).
  • Extrinsic Apoptosis Reaction Model, 2.0 (EARM 2). A family of biochemical models of the extrinsic apoptosis pathway, focused on exploring possible mechanisms for regulation of mitochondrial outer membrane permeabilization by the Bcl-2 protein family.
  • BayesSB. Markov chain Monte Carlo for parameter estimation of biological and biochemical models implemented in PySB.
  • EstCC. Scala package for calculating channel capacity, mutual information, and entropy for continuous and discrete variables.