Programmer’s guide (Internals)

Table of contents

   Overview
   Data processing
      Summary
   Web interface
      Summary
   A brief guide to the ACIS source files
   A detailed guide to the ACIS source files
      Traditional meta files
      Documentation
      Installation and configuration
      Presenters and other presentation-related files
         General
         Users’ screens
         Person-profile editing screens
         New user (initial registration) screens
         Administrative screens
         Email generation
         Other
      ACIS:: hierarchy
         ACIS::Web — Web interface of ACIS
         Research Profile underlying modules
         APU
         Citations
         Other ACIS:: modules
      Web::App — the web application framework
         ACIS::Data::DumpXML
      ARDB — the data-processing
      SQL helper
      Other

Overview

Internally, ACIS consists of two main subsystems and a number of smaller tools. These two main parts are very different and do not know much about each other; they work or can work pretty independently. First there is the data processing subsystem. Second there is the web interface subsystem. The glue that makes them perform together as a single application is the predefined and coordinated configuration, some shared modules and a bunch of scripts.

The data processing subsystem processes input metadata. It puts the data into database tables for the web interface to use. The web interface subsystem handles user requests arriving through the web server. Sometimes it creates data files, which data processing subsystem will process. The system is complex, but consists of simple parts; each part has distinct responsibilities.

Data processing

The data processing subsystem, in its turn, consists of two loosely coupled parts. The first part monitors data files, tracks their modifications, keeps records about data which is found in those files and filters out data records, which have bad identifiers. For historical reasons, it is called RePEc-Index, RI for short.

The RePEc-Index is built around a simple idea that metadata comes from collections. Each collection has a name (identifier), a type and is stored in data files somewhere in the filesystem. Metadata is only useful when someone processes it for something. So each collection may have further processing defined for it. All these things are specified in the collections configuration. ACIS creates this configuration for RePEc-Index in the file RI/collections.

In a collection, each file may contain zero, one or many data records. Each data record must have a unique identifier. If two or more records in a collection have the same identifier, there is a conflict. RePEc-Index excludes the conflicting records from further processing, until there’s only one record left with a particular id. When a data record is successfully read from a data file, RePEc-Index does its checks and, if everything is ok, executes further processing for it.

That is when the second part of data processing comes into play. The second part is ARDB (abbreviation from Abstract RePEc DataBase, again with historical reasons). RI sends ARDB a record object, and ARDB processes it. To process a record may mean to extract certain pieces of information from it, run arbitrary perl code on it, store it in a database table. Similarly, RePEc-Index calls ARDB when a record disappears (or when RI discovers an identifier conflict). ARDB then cleans up the database. It removes the data which originated from that record.

ARDB’s work is governed by an elaborate configuration. The configuration is stored in (configuration.xml in the ACIS home directory. The
configuration specifies what ARDB has to do when processing a record of a specific type. Possible kinds of actions: store some of the record’s data into a database table, execute certain perl code, extract a relationship of this record to some other record. When cleaning up a record, ARDB can remove database table records or call perl code.

ARDB’s configuration also defines database tables. Those may be tables, which other parts of the configuration refer to, but they don’t have to. Either way, ARDB can create those tables for you; there is a script for that (bin/create_tables). ACIS relies on this capability of ARDB to create tables which are needed.

Now, let me again outline the workflow of data processing. When RePEc-Index finds a new or changed data record, and this record has a valid unique identifier, it asks ARDB to process it. When RePEc-Index finds that a record has been removed from a data file, it asks ARDB to remove the record’s data. Also, when there is an id conflict, RePEc-Index will request ARDB to clean up the conflicting record’s data. The connection between RePEc-Index and ARDB is not intrinsic; it is just in the way of how ACIS configures RI to process the metadata.

Summary

Web interface

The web interface part of ACIS is:

There are two main parts in the set of perl modules that make ACIS web interface work. First is the framework. It contains general code for analysing a user’s request, deciding how to treat that request and generating a response. It helps ACIS to store and load user sessions and does a bunch of other little things for a web application. That’s why I call it the web application framework. It’s core is in the Web::App module.

The second part contains the specifics of ACIS. There lives code about ACIS user accounts, searching for research items and research institutions, creating and updating a personal profile and so on. This part is rooted at the ACIS::Web module. The framework (ie. the previous part) makes extending and debugging this part much easier by providing tools and a common environment.

These two parts are tied together by:

  1. inheritance — ACIS::Web class inherits from Web::App and extends it in ACIS-specific ways.

  2. screens.xml — the application configuration.

The application configuration, the screens.xml file, is built around the notion of a screen. Screens are basic units of the web interface; they handle incoming requests and generate responses. Each screen represents a certain piece of web-accessible functionality of the system.

Web::App looks at addresses (URLs) of the requests that arrive. For each request it decides which screen it is for. Each screen definition in screens.xml configuration specifies which modules and which functions will Web::App invoke for it. If a request came for an unknown (unspecified) screen, Web::App will generate a 404 error and display a “Sorry” screen.

Another thing that the screens configuration brings into the mix is presenters. For each screen it defines an XSLT file, which is used to generate a response page. This means the application presentation is strictly separate from the main application logic.

All this separation between web-application general and ACIS-specific, between presentation and application logic works for flexibility and extensibility of the system. It is an attempt to follow the “separation of concerns” principle.

At the same time, the CGI script in ACIS is so simple that I could have left it unmentioned without harm. Basically it creates an object and calls a method on it (or two) and there it ends. The other web-interface-related scripts mostly act in a similar way: they involve a certain part of the ACIS::Web and it does the rest.

Summary

A brief guide to the ACIS source files

Main parts and some key files

home/
the files and directories here are copied into the ACIS home directory during installation.
home/bin/
some scripts used during installation and afterwards
home/bin/setup
bin/setup script source
home/bin/templates/
each *.pl file here is slightly transformed and then saved into a script in $home/bin/ during installation
home/presentation/
All XSLT templates for screens and emails are there. And some other presentation-related files (like CSS and JS).
home/configuration.xml
This is configuration for ARDB; it defines database tables and how to process certain types of input data.
home/screens.xml
Screens definition file for ACIS::Web (and Web::App)
doc/
Documentation and files needed to regenerate it from its sources.
doc/*.text
documentation source files in Markup format.
doc/*.html
ready to read documentation files in HTML
doc/xslt/
XSLT templates used to generate HTML
lib/
most of the perl code is here
lib/ARDB/
abstract RePEc database system, which helps in metadata processing
lib/ACIS/
most of the ACIS-specific code is here
lib/ACIS/Web/
Assorted modules which handle ACIS screens via ACIS::Web and Web::App
lib/Web/App.pm
lib/Web/App/
Web::App framework and related files
EPrints/
Stuff for EPrints, see this doc for details
sql_helper/
A couple of modules which help ACIS deal with SQL database.
install.sh
Installation script

A detailed guide to the ACIS source files

This is an annotated list of most files of an ACIS distribution.

Traditional meta files

COPYING
GNU General Public License, version 2
MANIFEST
Full list of files, just the names
MANIFEST.SKIP
Makefile.PL
Perl script, which creates Makefile. It is not needed for installation, but useful for re-packaging a distribution (for instance, if you hacked one).
README
The most thrilling Agatha Christie’s detective novel. Serious.
TODO
The development plan and a bug/issue list

Documentation

doc/make.linked.pl
Generates HTML with table of contents, with indirect links between the pages… Requires Markdown.pl and xsltproc in PATH.
doc/make.simple.pl
Quickly generates HTML from the *.text files; requires Markdown.pl in PATH.
doc/check-filelist.pl
Compares lists of files in doc/internal.text and in MANIFEST. Obsolete.
doc/style.css
CSS for HTML documentation
doc/xslt/010.xsl
doc/xslt/020.xsl
doc/xslt/030.xsl
XSLT stylesheets which generate the interlinked HTML docs, used by doc/make.linked.pl
doc/index.html
doc/adm.html
doc/apache-conf.html
doc/conf.html
doc/install.html
doc/bdb-private.html
doc/overview.html
doc/researchprofile.html
doc/apu.html
doc/cooperate.html
doc/eprints.html
doc/eprints-install.html
doc/overview.html
doc/citations.html
doc/db.html
doc/daemon.html
doc/internal.html
Documentation in HTML
doc/*.text
Corresponding documentation sources in Markdown syntax with some additional custom markup.

Installation and configuration

home/bin/conf.pl
Reads main.conf and creates thisconf.sh, ardb.conf, acis.conf
home/bin/rid
Update daemon start/stop utility, see bin/rid
home/bin/setup
bin/setup utility template
home/bin/setup.cgi_frontend
home/bin/setup.logs-browsing
home/bin/setup.ri_collections
home/bin/setup.ri_local_setup.pm
home/bin/setup.sid_local.pm
Small scripts to create local configuration for this and that; used by bin/setup.
home/bin/templates/apu.pl
Template for bin/apu script — APU
home/bin/templates/clean-up.pl
Template for bin/clean-up script
home/bin/templates/create_tables.pl
Template for bin/create_tables script
home/bin/templates/upgrade_to_*.pl
Templates for the version-specific upgrade scripts.
home/bin/templates/*.pl
Templates for other utilities and scripts.
home/configuration.xml
ARDB configuration for ACIS’ primary metadata, for processing ReDIF and AMF
home/contributions.conf.xml
ACIS research item types and personal roles configuration
main.conf.eg
Example file for main.conf.
home/screens.xml
ACIS::Web’s (Web::App’s) web application configuration. Defines screens, their processors and presenters and some additional parameters.
install.sh
Installation/upgrade script. Creates ACIS home structure, if necessary. Copies all the files. Runs bin/setup if main.conf exists.

Presenters and other presentation-related files

General

home/presentation/default/global.xsl
Global variables definition; they are global in sense that they is used by many other templates and are available for use almost everywhere. Also provides show-status template.
home/presentation/default/page.xsl
The most important template of all. Defines both technical details of each HTML page that is generated (the page template) and a special markup, used all over the place. Directly or indirectly it is used by every HTML-page presenter of ACIS.
home/presentation/default/forms.xsl
Special utility template for forms generation.
home/presentation/default/page-universal.xsl
Provides appropriate-page and appropriate-page-soft template, which display new-user-page to new users and user-page to returning users.
home/presentation/default/errors.xml
Definitions of error messages, code => message. The codes are used throughout the ACIS::Web:: and Web::App:: hierarchies. See error() method in Web::App.
home/presentation/default/messages.xml
Messages, which are like errors, invoked by a code in Perl in ACIS::Web::… and Web::App::…. See message() method in Web::App.
home/presentation/default/fields.xml
Field names for form value errors reporting, used by show-status template.
home/presentation/default/fields-institution.xml
A replacement for fields.xml (see previous item) in case of the new-institution screen.
home/presentation/default/index.xsl
Template for the ACIS homepage
home/presentation/default/misc/login-pass.xsl
Asks password
home/presentation/default/misc/login.xsl
Asks login and password.
home/presentation/default/misc/sorry.xsl
Displayed in case of a 404 page not found error or when access denied.
home/presentation/default/misc/local-document.xsl
See ACIS::Web::Site.
home/presentation/default/phrase.xml
Contains default values for phrases, invoked through <phrase ref=’…’/> markup elsewhere in templates. Such a phrase element will be replaced with content from this file or its installation-local equivalent {HOME}/presentation/default/phrase-local.xml.
home/presentation/default/script/main.js
JavaScript
home/presentation/default/script/jquery.js
jQuery JavaScript library
home/presentation/default/style/brownish.css.add
This is joined with main.css to get brownish.css — the brownish color theme.
home/presentation/default/style/ie-font-sizes.css
Additional CSS for IE, as a hack to solve jumping font-size problem in IE 6/Win and IE 5.5/Win.
home/presentation/default/style/main.css
Main CSS file

Users’ screens

home/presentation/default/user/page.xsl
user-page template for all users’ screens.
home/presentation/default/user/welcome.xsl
When user has just logged in.
home/presentation/default/user/settings.xsl
The settings screen.
home/presentation/default/user/good-bye.xsl
Displayed after log-off
home/presentation/default/user/records-menu.xsl
For advanced users, which have several records.
home/presentation/default/user/unregister.xsl
home/presentation/default/user/account-deleted.xsl
Deleting a user account.

Person-profile editing screens

home/presentation/default/person/page.xsl
Provides elements, which are specific to the person-editing screens, e.g. person profile menu.
home/presentation/default/person/affiliations-common.xsl
home/presentation/default/person/affiliations-ir-guide.xsl
home/presentation/default/person/affiliations-search.xsl
home/presentation/default/person/affiliations.xsl
home/presentation/default/person/affiliations/new-institution.xsl
Affiliations and “submit institution” screens
home/presentation/default/person/contact.xsl
Contact info screen
home/presentation/default/person/name.xsl
Name details screen
home/presentation/default/person/interests.xsl
home/presentation/default/person/photo.xsl
There was an idea of capturing scientific interests and pictures of the users. Not in use anymore.
home/presentation/default/person/profile-overview.xsl
Profile overview screen.
home/presentation/default/person/profile-static.xsl
Generates profile static page
home/presentation/default/person/profile-show.xsl
Displays content of a personal profile, used by profile-overview.xsl and profile-static.xsl
home/presentation/default/person/research/*.xsl
Research profile screens and utilities.
home/presentation/default/citations/*.xsl
Citations-related screens
home/presentation/default/person/generic.xsl
A template for new person-editing screens, a stub

New user (initial registration) screens

home/presentation/default/new-user/page.xsl
The new-user-page template.
home/presentation/default/new-user/initial.xsl
home/presentation/default/new-user/additional.xsl
home/presentation/default/new-user/complete.xsl
home/presentation/default/new-user/confirm.xsl

Administrative screens

home/presentation/default/adm/events-decode.xsl
/adm/events/decode screen
home/presentation/default/adm/events-raw.xsl
/adm/events/raw screen
home/presentation/default/adm/events.xsl
Currently unused bits for the events screens; might be useful in the future or will be thrown away.
home/presentation/default/adm/index.xsl
/adm screen
home/presentation/default/adm/pass.xsl
Asks password for /adm/… screens
home/presentation/default/adm/search-res-doc.xsl
home/presentation/default/adm/search-res-rec.xsl
home/presentation/default/adm/search-res-usr.xsl
home/presentation/default/adm/search.xsl
/adm/search screen and its result presenters
home/presentation/default/adm/session-deleted.xsl
home/presentation/default/adm/session.xsl
home/presentation/default/adm/sessions.xsl
/adm/sessions screen and related
home/presentation/default/adm/sql.xsl
/adm/sql screen

Email generation

home/presentation/default/email/general.xsl
Email message generation general template and utilities.
home/presentation/default/email/*.xsl
Templates for email messages for different occasions

Other

home/presentation/default/misc/forgotten-password.xsl
Forgotten password reminder screen.
home/presentation/default/widgets.xsl
Widgets for use elsewhere. Tabset template for the research profile.
home/presentation/default/indent.xsl
A helper template for indenting text; used by home/presentation/default/export/redif.xsl
home/presentation/default/export/amf-person.xsl
Generates AMF person data
home/presentation/default/export/redif.xsl
Generates ReDIF person template
home/presentation/default/misc/time.xsl
Converts number of seconds into a human-readable English phrase.
home/presentation/default/misc/time-test-data.xml
A piece of testing data for home/presentation/default/misc/time.xsl.
home/presentation/default/stub.xsl
A stub for an XSL file.

ACIS:: hierarchy

ACIS::Web — Web interface of ACIS

lib/ACIS/Web.pm
ACIS::Web module; inherits from Web::App and specifies some important details of how exactly the Web::App is used for ACIS.
lib/ACIS/Web/Admin.pm
Code behined the main administrative screens. Also: provides tools for some other modules.
lib/ACIS/Web/Admin/Events.pm
Powers /adm/events/decode and /adm/events/raw.
lib/ACIS/Web/Affiliations.pm
Affiliations screen, both for the initial registration and returning users.
lib/ACIS/Web/Background.pm
ACIS::Web::Background module. Its all about forking a process and keeping track of the forked threads.
lib/ACIS/Web/CGI/Untaint/latinname.pm
lib/ACIS/Web/CGI/Untaint/name.pm
lib/ACIS/Web/CGI/Untaint/password.pm
lib/ACIS/Web/CGI/Untaint/simpleemail.pm
lib/ACIS/Web/CGI/Untaint/url.pm
CGI input parameters checking plugins for CGI::Untaint.
lib/ACIS/Web/Config.pm
List of local configuration parameters and their defaults.
lib/ACIS/Web/Contributions.pm
Research profile screens
lib/ACIS/Web/Export.pm
Exports personal data in ReDIF and AMF.
lib/ACIS/Web/Import.pm
Imports personal data from ReDIF.
lib/ACIS/Web/NewUser.pm
Code behind the initial registration screens.
lib/ACIS/Web/Person.pm
Some personal profile-specific code.
lib/ACIS/Web/SaveProfile.pm
Saves a profile as a static HTML page.
lib/ACIS/Web/Services.pm
Some general services for many screens and processors. Starts and load sessions, handles authentication, checks form values, provides form-field values, sets and clears cookies…
lib/ACIS/Web/Session.pm
lib/ACIS/Web/Session/SMagic.pm
lib/ACIS/Web/Session/SNewUser.pm
lib/ACIS/Web/Session/SOldUser.pm
Session classes
lib/ACIS/Web/Site.pm
Serves local files from {HOME}/site directory as static content HTML pages. Uses home/presentation/default/misc/local-document.xsl. Invoked by ACIS::Web.
lib/ACIS/Web/SysProfile.pm
Manages acis.sysprof table; saves and loads parameter-value pairs for a particular user or record. This is used by research profile and ARPU. Generally useful when ACIS needs to remember something about a user or a record, without writing it to his/her userdata.
lib/ACIS/Web/User.pm
Registered users’ screens: name, settings, et cetera.
lib/ACIS/Web/UserData.pm
User account storage module. Reads and writes userdata files.
lib/ACIS/Web/Citations.pm
Citations-related screens

Research Profile underlying modules

lib/ACIS/Resources/Search.pm
General tools for reserch database search. Used for the manual search screen.
lib/ACIS/Resources/AutoSearch.pm
Research profile’s automatic search, based on the user’s name variations. May be called in a forked background process (via ACIS::Web::Background).
lib/ACIS/Resources/Suggestions.pm
Management of the rp_suggestions table.
lib/ACIS/Resources/SearchFuzzy.pm
Fuzzy name-variation-based search for documents.

APU

lib/ACIS/APU.pm
Automatic Profile Update system, the core
lib/ACIS/APU/Queue.pm
Queue management part
lib/ACIS/APU/RP.pm
APU for research profile

Citations

lib/ACIS/Citations/Input.pm
citations input processing, maintenance of the citations table (and citations\_deleted)
lib/ACIS/Citations/Profile.pm
maintenance of the citations profile in the userdata
lib/ACIS/Citations/Search.pm
search for citations
lib/ACIS/Citations/SimMatrix.pm
citations-documents similarity matrix class
lib/ACIS/Citations/CitDocSim.pm
citation-document matching
lib/ACIS/Citations/Utils.pm
assorted utilities, used by the other modules
lib/ACIS/Citations/Suggestions.pm
citation suggestions storage, management of the cit\_sug and cit\_old\_sug tables
lib/ACIS/Citations/AutoUpdate.pm
APU for citations profile
lib/ACIS/Citations/Events.pm
citations\_events table maintenance

Other ACIS:: modules

lib/ACIS/Misc.pm
Several uncategorized functions used elsewhere
lib/ACIS/ShortIDs.pm
Short-id generation and database management module
lib/ACIS/UserData/Data/Record.pm
Class for records branch of userdata of an ACIS user; implements ARDB::Record interface.
lib/ACIS/UserData/User/Record.pm
Class for data/owner branch of userdata of an ACIS user; implements ARDB::Record interface.

Web::App — the web application framework

lib/Web/App.pm
Web::App — the core of the web applications framework. Designed to be generic, although some project-specific stuff is still there.
lib/Web/App/Common.pm
Some utilities for general consumption.
lib/Web/App/XSLT.pm
Add-on to Web::App for XSLT presentation building (using XML::LibXSLT)
lib/Web/App/FormsProcessing.pm
checking and processing form parameters, preparing input values for the forms
lib/Web/App/Config.pm
placeholder
lib/Web/App/Config/Parse.pm
Parses screens.xml and site-local configuration (e.g. acis.conf). This is used to make config.bin file during ACIS setup. That file is then reused every time ACIS::Web object is created. The module is loaded by parse_config() method of Web::App.
lib/Web/App/Email.pm
Web::App::Email, provides email-sending services to other modules.
lib/Web/App/EmailFormat.pm
Used by Web::App::Email to format messages.
lib/Web/App/Screen.pm
A screen configuration container class, very simple. (One object — one screen.)
lib/Web/App/Session.pm
Web::App::Session — session class.

ACIS::Data::DumpXML

ACIS::Data::DumpXML is an XML-serializer, used by ACIS::Web (and Web::App). ACIS::Data::DumpXML::Parser is the de-serializer. Technically they are in ACIS:: hierarchy, but logically they belong to Web::App::.

lib/ACIS/Data/DumpXML.pm
ACIS::Data::DumpXML module. Converts a perl data structure into XML.
lib/ACIS/Data/DumpXML/Parser.pm
Converts XML, created by ACIS::Data::DumpXML, into a perl data structure.

ARDB — the data-processing

Abstract metadata database system. Analyses independent metadata records and stores them in a database according to a configuration. Extracts and stores relations between records. Can retrieve records back, if necessary.

lib/ARDB.pm
The core, the ARDB module
lib/ARDB/Configuration.pm
Parses configuration, configuration.xml
lib/ARDB/ObjectDB.pm
Stores a record in the objects table, for later use.
lib/ARDB/Table.pm
ARDB::Table class; represents a database table and provides table-related methods, like create, delete, store a record, et cetera.
lib/ARDB/Common.pm
General utilities, logging and diagnostics.
lib/ARDB/Plugins.pm
lib/ARDB/Record.pm
ARDB::Record, abstract class for metadata objects, that ARDB can store and process.
lib/ARDB/Record/ReDIF.pm
Implementation of ARDB::Record for ReDIF templates.
lib/ARDB/Record/Simple.pm
An implementation of ARDB::Record for a simple one-level hash. Currently not used anywhere in ACIS.
lib/ARDB/RelationType.pm
Container class for relation type objects, which are specified in an ARDB configuration.
lib/ARDB/Relations.pm
ARDB::Relations class. Manages relations table; stores and retrieves relations between records (objects).
lib/ARDB/Relations/Transaction.pm
A transaction level on top of ARDB::Relations. Changes to a transaction are not saved until the transaction is committed.
lib/ARDB/Setup.pm
Checks ARDB configuration (both local and application-specific) and stores it into a ARDB::Local perl-module file to avoid re-parsing and re-checking it every time.
lib/ARDB/Plugin/Processing.pm
lib/ARDB/Plugin/Processing/ACIS_UD.pm
lib/ARDB/Plugin/Processing/HoPEc.pm
lib/ARDB/Plugin/Processing/ShortIDs.pm
lib/ARDB/SiteConfig.pm
Container class for local site’s configuration, such as DB access parameters.
lib/ARDB/Test.pm
A simple testing framework. It is for ARDB test scripts in t/.
lib/ARDB/ReDIF/Processing.pm
ReDIF metadata processing code
lib/ARDB/AMF/Processing.pm
AMF metadata processing code
lib/ARDB/RI.pm
Interface module, from RePEc::Index to ARDB

SQL helper

SQL helper module, used throughout ACIS. Provides a conveniently-wrapped interface to MySQL with problem logging.

sql_result is the class for SQL query results.

sql_helper/MANIFEST
sql_helper/MANIFEST.SKIP
sql_helper/Makefile.PL
sql_helper/sql_helper.pm
sql_helper/sql_result.pm

Other

lib/RePEc/Index/Collection/ACIS_UD.pm
Implementation of the “ACIS_UD” collection type for RePEc-Index (update daemon). ACIS_UD is ACIS userdata files collection.
lib/RePEc/Index/Collection/CitationsAMF.pm
Implementation of the “CitationsAMF” collection type for RePEc-Index.
lib/RePEc/Index/Collection/FullTextUrlsAMF.pm
Implementation of the “FullTextUrlsAMF” collection type for RePEc-Index. This is for the FullText URLs input data.