We need to talk about how we talk about protein families and complexes

Posted on July 08, 2018 in Big Mechanism, INDRA

(Our paper on FamPlex, a semantic resource for improving text mining and biocuration for protein families and complexes, is available at BMC Bioinformatics here. Synopsis follows below.)

When Ben Gyori and I started working with natural language processing (NLP) systems as part of the DARPA Big Mechanism program, we found …

Continue reading

Building a Python 2/3 compatible Unicode Sandwich

Posted on March 10, 2017 in programming • Tagged with python

So you've decided that your code needs to be compatible with both Python 2 and Python 3. Most likely, you're upgrading your Python 2 code to work in Python 3, and know that you need to do things like:

  • Replace all calls to print with print()
  • Use absolute rather than …
Continue reading

What fraction of articles can you expect to be available for text mining from Pubmed Central?

Posted on May 20, 2016 in text mining • Tagged with text mining, ras model

In a previous post, I described assembling sets of Pubmed references relevant to 227 genes in the Ras pathway. The next problem was getting access to mineable content.

The most readily available source of content for text mining by researchers is the Pubmed Central Open Access article subset, and most …

Continue reading

Assembling a text-mining corpus for the Ras pathway

Posted on May 19, 2016 in text mining • Tagged with text mining, ras model

Over the last year and a half or so I've been involved in the Big Mechanism program sponsored by DARPA. The practical goal of this program is to develop software systems to extract facts from the scientific literature by text mining and, from these facts, assemble causal, mechanistic models that …

Continue reading