ACIS configuration

Table of contents

   The main.conf file
   Parameters
      Core
      Perl and CGI
      Database parameters
      Data input (primary metadata: research, institutions, citations, etc.)
      Data output (personal data and submitted institutions)
      Research profile
         Fuzzy search
      APU
      Citations screens
      Logging, debugging and performance profiling
      Other
   Updating a running system’s configuration — bin/setup utility

The main.conf file

The primary configuration file of an ACIS installation is main.conf. This file is an AppConfig file, which means it has a simple parameter=value syntax, described in detail in AppConfig manpage. There is an example of such file in main.conf.eg in ACIS home directory.

Parameters

Core

site-name
Short name of your website. Will be used in generated page titles as a prefix, in emails as [part] of subject lines. The parameter is required.
site-name-long
Full name of the website. Will be shown at the top of each page and used in emails to refer back to the service. Required.
admin-email
Email address of the site administrator, i.e. yourself. Required.
base-url
URL of the ACIS CGI script. If you want ACIS to respond at the top-level URL of your website, set this to something like http://web.site.org. But then you need to configure your web server accordingly. How to do that in Apache. Do not put a trailing slash into this value. If you get this value wrong, ACIS will not correctly interpret users’ requests. Required.
base-cgi-script-filename
Filename of the CGI script, which ACIS creates and which is accessible through the base-url URL. Required.
home-url
URL of the public homepage of the service. Value of site-name-long, displayed on each page will link to it. May be the same as base-url or different. Required.
static-base-url
URL of web-accessible directory for ACIS to store its static web-accessible files. ACIS will use it for JavaScript and CSS files, personal profile pages, et cetera. It may be the same as base-url, but you’ll need to configure your webserver accordingly. It has to serve static files by itself, and call ACIS for everything else. Required.
static-base-dir
Path to the directory, corresponding to the static-base-url. Required.
compact-redirected-profile-urls
This either contains a true value (e.g. 1) or false. Set to true if you want shorter personal profile URLs and have setup your webserver accordingly. URL of a profile page is built as follows: static-base-url + profile-pages-dir + profile’s unique part + "/". Profile’s unique part is its short-id in slash-separated form: e.g. "p/s/i/d/3". If you enable this option, this part will instead be just short-id, e.g. "psid3" and the whole URL will be shorter and cleaner. Default value: undef, i.e. false;
profile-pages-dir
Prefix of the profile pages URLs. Terminate it with a slash. See previous item for an explanation. Default value: “profile/”.
session-lifetime
For how many minutes a session lives without a user action, untill it expires. Default value: 15.
system-email
Value for the “From:” header of email messages, that ACIS will send. Required.
sendmail
A mail-sending program name. May be something like /usr/sbin/sendmail -t . Required.

Perl and CGI

perlbin

Perl binary to use for ACIS and its utilities. Default: taken from which perl during bin/setup.

perllibprepend

Type: string. Additional directory for Perl libraries. This option prepends a Perl library directory directory into the main CGI file a.cgi. It should be used to instruct perl about where to look for modules (in addition to standard perl library directories). See also: @INC entry in the perlvar manpage and the require function manpage.

perllibadd

Type: string. Additional directory for Perl libraries. This option appends a Perl library directory directory into the main CGI file a.cgi, by saying BEGIN{ push @INC, ‘directory’; }.

cgi-pperl-binary

Type: string. Enables support for Matt Sergeant’s fine PPerl instead of plain old slow CGI. If set, it will be used instead of perl in the CGI script’s shebang line (the first line of the script, which normally says ”#!/usr/bin/perl”). It will also cause the CGI script to be adapted in some minor ways to the PPerl environment: some modules will be preloaded and clean-ups will be done after each request processed.

cgi-pperl-reinit

Type: string. Command to reinitialize or simply shutdown the PPerl process. See previous entry for PPerl pointers. The command will be executed by the bin/setup script after installation or configuration changes.

Database parameters

db-name
Name of a MySQL database that ACIS and its components will use to store its own data. Required.
db-user
Name of the MySQL database user to use when connecting to MySQL server. Required.
db-pass
Password to use when connecting to MySQL server. Required.
acis-db-name
sid-db-name
metadata-db-name
All three are deprecated parameters replaced by the single db-name. They were used to specify MySQL database names for specific compontents of ACIS. Certain database tables would then go to “acis” database, some to “sid” database and the rest to the “metadata” database. Should not be used in new acis installations.

Data input (primary metadata: research, institutions, citations, etc.)

metadata-collections
Additional metadata collections to monitor and process with the RI daemon. The variable contains space-delimited collection identifiers for ACIS to process. Each collection must be further defined with a pair of corresponding metadata-X-home and metadata-X-type parameters, where X is the identifier. The identifiers must be unique. Optional.
metadata-X-type
Type of data collection X. Useful possible values: “RePEcRec”, “AMF”, “CitationsAMF” (for citations data) and “FullTextUrlsAMF” (for full-text URLs data).
metadata-X-home
Directory where the files of collection X are.

Data output (personal data and submitted institutions)

person-id-prefix
Prefix to the generated personal record identifiers (not short-ids, if you care). Required.
metadata-redif-output-dir
Directory to put generated ReDIF files into. Optional.
metadata-amf-output-dir
Directory to put generated AMF files into. Optional.
institutions-maintainer-email
Email address of the person maintaining the institutions database. When users submit an institution’s data, message will be sent to this address. Defaults to admin-email, if not specified.

Research profile

See Research Profile document.

research-additional-searches
Type: boolean. Whether or not to run additional, disk and CPU-consuming database queries as part of automatic research searches. They are not required as it is, but increase quality of search under certain conditions; sometimes they would find works for the users, which would not have been found otherwise. Default: off.
Enable Document to document links screen? Default: false.
full-text-urls-recognition
Enable Full-Text URLs screen in research profile? Default: no.

Fuzzy search

See Fuzzy search in research profile.

Type: Boolean. Run or not run fuzzy-matching during the research automatic searches? The value does not matter if research-additional-searches is false, fuzzy searches are a kind of additional searches. Default: no, do not run.
fuzzy-name-search-min-common-prefix
The number of characters n of at the start of a name variation that has to match in the name expressions exactly. Default: 3.
fuzzy-name-search-min-variation-length
The minimum number of characters m that a name variation would have to have in order to qualify for being fuzzy matched. The default is 7.
fuzzy-name-search-max-name-occurr-in-doc-names
The maximum number of occurrences of a name expression in the document author names table before it is considered for fuzzy matching. The default is 1. If this parameter is set to 0 or is not set, no maximum is checked.
fuzzy-name-search-max-name-occurr-in-name-variations
The maximum number of occurrences of a name expression in the name variations table before it is considered for fuzzy matching. By default, maximum is 0, ie. a name expression should not be present among name variations. Set is to -1 to disable this limit.
fuzzy-name-search-via-web
Should fuzzy name searches be run when research search is initiated by the online user? (When a search is APU-initiated, this is a question of fuzzy-name-search.) Default: false. (This option requires enabled fuzzy-name-search and research-additional-searches.)

APU

This is about automatic profile update.

minimum-apu-period-days
Type: integer number. This is the miminum number of days between APU runs for a single record. In other words, APU won’t be run for a record, if last time APU has already been done for it less than that many days ago. Default: 21 (day). (More often than this may cause an overwhelming effect on some users in case of a growing database, if every time at APU something is found and every time an email is sent.)
echo-apu-mails
Type: Boolean. Whether or not send a copy of all APU mails to the service admin. If set to true, admin’s email will be added into the BCC: field.
apu-research-mail-include-approx-hits
Type: Boolean. Include approximate (non-exact) matches into the ARPU mail? Default: no.
apu-research-max-suggestions-in-a-mail
How many research items to list in an ARPU mail at most? Default: no limit.
apu-citations-auto-add-limit
Add no more than this given amount of citations to a person profile in one APU go. Default: no limit.
disable-citation-mails
Type: Boolean. Do not send APU-citation mails (even if changes have been done to a profile during APU).

Citations screens

This is for citations features.

citations-profile
To show the citation profile to the users or not? Boolean value. Default: false.
citation-document-similarity-func

The parameter specifies a Perl function which ACIS will call internally. The function will be called for assessing similarity between a citation string and a document record and must conform to the Citation-document similarity assessment interface.

The function must accept two parameters on input and return a numerical value between 0 (no similarity) and 1 (a perfect match) inclusive.

The default value for this parameter provides a function, documented in section Default citation-document similarity assessment algorithm.

citation-document-similarity-useful-threshold

Citations, which have similarity value less than this threshold, won’t even be suggested for user’s consideration as potential.

Default value: 0.65

citation-document-similarity-preselect-threshold

Citations, which have similarity value higher or equal than this threshold, will be offered as pre-selected by default (for a specific document).

Default value: 0.85

citation-document-similarity-ttl

Time-to-live for calculated similarity values in days. After a similarity value is calculated by the similarity function, the value is stored in the database. After time-to-live days pass since the original calculation took place, this value will be considered expired and will be re-calculated with then-current similarity function.

Default value: 100

citations-max-online-comparisons

A limit on the number of computationally expensive citation-document comparisons to run for a online user. This is to avoid putting a big load on a running system, and thus making it unresponsive to user. Depending on the machine performance and the number of users, setting it to something like 400 may be a good idea.

Default: undefined, i.e. no limit.

citation-presentation-reverse

On the potential citations screen, show HOW the work is cited first, and WHERE it is cited second. For example: “as: …(citation string) in: …(work title) by … (authors)”. This may make looking through a large list of citations easier for the user.

Default: off. So by default each citation is presented this way: “in: … by …” on the first line, and “as: …” on the second.

citations-by-document-search-at-profile-load

Boolean. If true, enables one additional online search. When user first enters into citation profile (in a session), immediately execute a search for citations by his document ids. Default: false.

citations-do-not-store-useless-similarity

Boolean. When ACIS runs a citation-document comparison, it usually (by default) stores its value in the database. (In the cit_doc_similarity table.) Later these comparison results may be reused. If this parameter is set to a true value, then only those comparisons which resulted in a usefully-high similarity value would have their result stored in the database.

This is a way to trade performance for database size (i.e. disk space). Default: false.

Logging, debugging and performance profiling

debug-info-visible
Whether or not to append debuging info to each generated web page. Do not enable this on production systems: it can be used by hackers to gain knowledge of the internal site configuration, which is not a safe thing. Also, makes pages much bigger and system slower. Default: undef, ie. feature is off.
debug-log
When specified to a writeable file name, each ACIS request’s processing will be debug-logged to this file. Verbose. Default: undef, ie. the feature is off.
extreme-debug
Enables printing of the internal debugging messages on every web page immediately, as ACIS processes a request. Default: undef, ie. feature is off.
debug-transformations
ACIS uses XSLT stylesheets to produce HTML pages (and email messages). For each request ACIS generates an XML document and then pipes it to the appropriate XSLT stylesheet. When this option is enabled, ACIS will save intermediate XML file into {HOME}/presenter_data.xml and the XSLT result into {HOME}/presenter_result.xml. If the files exist, ACIS will overwrite it silently, so at any given time these files will contain only the last request’ data. This option might be helpful for debugging the XSLT stylesheets. Default: undef, ie. feature is off.
log-profiling-data
Boolean parameter. Enables logging of system profiling data to the {HOME}/profiling.log. Default: off.
show-profiling-data
Boolean parameter. Enables display of system profiling data (timings) at the end of each page. Default: off.
requests-log
Name of file to log every incoming request to. Defaults to string "*stderr*" which means “log to standard error output filehandle”. Usually, Apache redirects stderr output of CGI scripts to the error log of the website.

Other

admin-access-pass
Special administrator’s password to access the administrator’s screens. Must be at least 6 characters long. Optional. When not specified only users which have admin privileges can access the screens and only while they are logged in. Read more about it.
template-set
Name of a set of XSLT presentation files to use. A set of XSLT presentation files is a directory, path, relative to {HOME}/presentation/ directory. Default value: “default”. So by default XSLT templates are read from {HOME}/presentation/default/.
presenters-dir
Specifies a path, where to search for XSLT templates and some other related files. Overrides template-set if defined. Default: XSLT templates are read from {HOME}/presentation/default/.
umask
The umask to use when creating files and directories. This value directly influences the access permissions of the created files. Read perl’s perldoc -f umask and/or Unix’ man umask. Optional.
require-modules
List of perl modules or simply “.pl” files to load (require) upon system start. Whitespace separated.
backup-directory
Type: string. Path to a directory for bin/backup_tables to put its backups into. Must exist at the time bin/backup_tables is run.

Updating a running system’s configuration — bin/setup utility

main.conf file is the main ACIS configuration file. Yet it is not directly read by ACIS and any of its’ components during normal operation. Instead they read their own separate configuration files. For instance, ardb.conf, acis.conf, RI/collections, thisconf.sh, et cetera.

All these files are created and updated from main.conf’s contents by the bin/setup utility.

bin/setup reads main.conf and generates other necessary files. You need to run it every time after changing main.conf to make configuration changes get reflected in the installation. You may also need to restart the daemon program so that it is aware of the changes you made.

About the other utilities read the respective section in the administrator’s guide.