NLP for facilitating and accelerating curation in RegulonDB
Manual summary | Automatic summary |
ArgR has two domains: The N-terminal domain, which contains a winged-helix-turn-helix DNA-binding
motif and the C-terminal domain, which contains a motif that binds L-arginine and a motif for
oligomerization. Based on cross-linking analysis of wild-type and mutant ArgR proteins, it has
been shown that the C-terminus is more important in cer/Xer site-specific recombination than in DNA-binding. ArgR complexed with L-arginine represses the transcription of several genes involved in biosynthesis and transport of arginine, transport of histidine, and its own synthesis and activates genes for arginine-catabolism. ArgR is also essential for a site-specific recombination reaction that resolves plasmid ColE1 multimers to monomers and is necessary for plasmid stability. |
Results The domain structure of ArgR. The mutagenesis results of two laboratories have shown that the ArgR
subunit is made up of two functional regions: a basic N-terminal half responsible for DNA-binding and
an acidic C-terminal half responsible for both oligomerization and for
binding arginine (Burke et al., 1994; Tian & Maas, 1994). We overexpressed the C-terminal domain of ArgR (ArgRc) corresponding to amino acids 80 to 156 in a T 7 polymerase-driven system and purified the protein to homogeneity . Discussion The C-terminal domain of ArgR forms a hexameric protein core that contains the binding sites for L-arginine and provides a central, symmetric scaffold for six DNA-binding domains. In addition to regulating the transcription of arginine biosynthetic genes, ArgR plays an obligatory role in a site-specific recombination reaction that resolves ColE1-like plasmid multimers to monomers and is necessary for plasmid stability (Stirling et al. , 1988) . |
a) Regulatory Interactions. As a product of the work described above, two NLP resources have been created that we make them available for the BioNLP community, especially for tasks of automatic classification, passage detection, and relation extraction. The first is a data set of validated sentences divided in two classes. These sentences were obtained from 142 articles concerning transcriptional regulation.
Description | Class | Total instances | File |
Regulatory interactions without growth condition | RI | 896 | Download |
Regulatory interactions with growth condition | RI+GC | 253 | Download |
b) TFs summaries. The second resource is a data set of manual summaries for 178 TFs in text format. These could be used as samples of high-quality curated knowledge comprising several properties of TFs for developing user-oriented multi-document summarizers in automatic text summarization research.
Description | Total instances | File |
Manual summaries of 178 TFs in text format | 178 | Download |