Automatic Profile Update

Table of contents

   Introduction & overview
   Some older terms
   The bin/apu script
   How to run bin/apu?
   Configuration parameters
   Technical: Queue management
   Technical: Responsible perl code

Introduction & overview

Automatic Profile Update is a subsystem of ACIS, which performs a number of operations on user profiles automatically. It serves several purposes, but the primary one is to make maintaining a profile in ACIS less of a chore for the user.

The metadata dataset of a running ACIS service is expected to be dynamic, with new items appearing and some old items going out (for a number of possible reasons, even if undesired). So one specific thing APU does is finding new documents in the documents database, which a registered person might have written. If there is a highly probable match (e.g. exact name match), APU would automatically add the document to the person’s research profile.

On the other hand, if a document was claimed by the user, but has gone out of the document database, APU would clean it out of the person’s profile. (A special grace period would be observed to allow short-term database fluctuations, when items disappear for a while and then appear back.)

When APU does a search for a personal profile, and either adds something to the profile, or finds some items for consideration, it would send an email to the user, notifying her about the stuff found or changes done.

And yet another thing APU does is related to the citation profile. After looking at the person’s research profile, and doing a research items search, APU also looks for new citations for the research profile items. And if there are any, it would either automatically identify them to RP items or store them in a database for the user’s consideration. Similarly, if something interesting was found, it would email the user in charge of the profile about it.

So at this time APU performs three functions for personal profiles:

Some older terms

ARPU
Automatic Research Profile Update. It used to be a separate feature, but now — a part of APU.

The bin/apu script

APU is done for users while they are not logged in and not interacting with the service. So it is run separately from the main web application interface. All the APU-related functions are available via the bin/apu script.

Usage:

$ bin/apu [options] [<number>]

Run APU processing for the next personal record on the queue.

If a number <number> is given, the APU is done for that many items in the APU queue. By default it is done for just one record.

The command won’t produce any output on successful execution, but would write its main steps to a log file autoprofileupdate.log.

$ bin/apu [options] queue <identifier>

Put the record <identifier> onto the APU queue. <identifier> may be a short-id, an id, or a user account email address.

The command won’t produce any output on success.

The possible options are:

--debug
Enable printing debugging output to stdout, of all the process. The debuggings are pretty verbose, for a large profile it may print hundreds of screenfuls.
--interactive
--inter
Duplicate the autoprofileupdate.log log messages on the standard output. Would give you a general idea of what is going on while APU is running.
--failed
Try again to execute APU for the previously failed items in the APU queue, if there are any.
--noauto
Do not automatically clear and re-populate the queue table, if the end of the queue is reached.

How to run bin/apu?

Run it at low-load periods or regularly as a cron-job.

Frequency of the APU runs and the number argument you use is important. Make it high enough to cover all the users.

For example, if you have roughly 300 personal records, it will be more than enough to run “bin/apu 4” three times a day. It will go through 12 records a day, about 360 per month.

In case you run it too often, it will start skipping queued items. The script will skip a record, if APU was already done for it recently. See minimum-apu-period-days configuration parameter.

Configuration parameters

These are the APU-related ACIS configuration parameters (to be used in the the main.conf configuration file):

Technical: Queue management

APU stores it’s queue in the apu_queue table.

The apu_queue table:
field description
what record id
position queue item number
filed date & time when it was put onto the queue
class should this item be treated with a priority?
notes any messages during APU execution
worked date & time when it was executed

It initially puts every known personal record onto the queue. As APU processing is executed for a record, the queue item status is changed from empty string to “OK” or “FAIL” in the queue table. Normally APU runs on empty-status queue items.

Then when it reaches the end of the queue, it deletes the table content and starts over again. This usually happens automatically.

Technical: Responsible perl code

These modules in ACIS are responsible for APU:

ACIS::APU
General APU logic. Used directly by the apu script.
ACIS::APU::Queue
Provides APU queue management.
ACIS::APU::RP
Does research-profile processing.
ACIS::Citations::AutoUpdate
Does citations-profile automatic processing.

Scripts:

home/bin/template/apu.pl
Provides command-line interface ACIS::APU module. Is basis for the bin/apu script.