Citations in ACIS =================
document source in Markdown format
In the potential tab we offer citations for user to identify. We only offer citations that we found similar to the description of the document. Two kinds of citations may be offered: "new" and "old". Old citations are the ones that we have already suggested to the user for this particular document in the past. (Thus a citation can only be old for a certain document.) A citation becomes old only after we suggested it to a user and the user submitted the form, but did not select this citation for inclusion. If she didn't submit the form, which included this citation as new, we will still consider it new.
We present the potential citations as shown in the prototype page linked above. The old citations will not be shown by default, only if the user requests it. Both old and new citations are ordered by decreasing similarity of the citation to the document. Some of the new citations, i.e. for which the similarity is higher than a certain threshold, will have the checkbox checked by default. The threshold level is configureable by system administrator. The "Not my work" button removes a citation from the pool of potential citations for all the documents, and puts it into the refused citations list. If the user presses the SUBMIT FORM button, all checked citations will be identified as pointing to that document. They will no longer be suggested for identification to other documents. When there is at least one other document with new citations for it, we show a checkbox near the submit button, labeled "Show next document with new citations". This checkbox will be checked by default if the current page offers any new citations. If the checkbox is checked when the form is submitted, we process the user's choices and then redirect her to the next most interesting document. If we have no potential citations for the document in question, we will show a message saying: We have no citations to suggest for this document at this time.
FULLTEXT INDEX (nstring), INDEX (trgdocid)
is a reference to a hash:
{ ostring => CITSTR,
nstring => NORCITSTR }
Here CITSTR is the original citation string, and NORCITSTR is the normalized citation string.
is a reference to a hash:
{ title => TITLE,
authors => [ AUTHOR1NAME, AUTHOR2NAME, ... ],
type => TYPE,
location => LOCATION
}
Here TITLE is the work's title (a string), AUTHOR1NAME, AUTHOR2NAME and so on are the names of the authors (strings), and TYPE is one of "article", "paper", "book", "software", "chapter", "text" (when no particular type is known).
LOCATION is a string built by joining the following items: the series or journal name, the paper's number in the series (if it is present), the issue/volume/pages. For AMF we can take all the adjectives of the serial adjective container.
foreach ( @names ) {
if ( amatch( $_, $citation ) ) {
$pass = 1;
last;
}
}
If the document is an author pass, it will be ranked according to the
string similarity of the title only, see next point.
2. Compare titles. Take the normalized citation string and take the normalized
title of the research item. Find where in the citation
string the first word of the title is present.
Take as many characters of the citation string as there
are in the title and compare them to the work's title
(with [String::Similarity](http://search.cpan.org/perdoc?String::Similarity)).
(Alternatively, find where in the citation string the last word of the
title is present and take a substring from start to end. Compare in the
same way.)
(Alternatively, do a String::Approx amatch on the citation
string, with the document's title used as the pattern. --
No, this will result in just yes/no value, but we need a
numeric measure.)
Get the comparison result as a number between 0 and 1.
- Also see [Digital Libraries and Autonomous Citation
Indexing](http://citeseer.ist.psu.edu/aci-computer/aci-computer99.html)
paper and its section "Methods" for a brief discussion of
what CiteSeer team tried.
### Citation suggestions
#### Citation to document similarity table: cit\_doc_similarity
Table cit\_doc_similarity fields:
- citation id: Each citation item in the userdata is a hash with these items:
- cnid - ostring - srcdocid - srcdoctitle - srcdocauthors - srcdocurlabout - reason (if manually added) - autoaddreason (if automatically added) - autoadded (if automatically added) ### Storing and passing citations around Citations will be stored as: 1. citation table records 2. in-memory suggestions 3. userdata items (see above) Basically, the citations will go from 1 to 2 and from 2 to 3. At each step, some hash keys may be added to a citation or may be removed. But sometimes a citation is removed from userdata and then it may be re-considered as a suggestion (3->2). ### Modules #### ACIS::Citations #### ACIS::Citations::Utils - normalize_string( string ) - build_citations\_index( citlist, [index] ); - build a hash of [ cnid: citation ] pairs ??? - get_document\_authors(); - for co-authors' claims - cit_document\_similarity( cit, doc ) -- default cit-doc similarity func and other useful stuff #### ACIS::Citations::Input use ACIS::Citations::Utils; - process_citation( cit ) - normalize, cut the editors, calculate the checksum - save_citation( cit ) - check_citation( cit ) - reload citation from the citations table or return undef otherwise #### ACIS::Citations::Suggestions #### ACIS::Citations::SimMatrix use ACIS::Citations::Suggestions; exported: - load_similarity\_matrix( record ); - returns matrix structure as specified above object methods: - most\_interesting\_doc( ); - remove_citation( $matrix, $cit ); ... internal: - _calculate_totals( $matrix ); #### ACIS::Citations::Search use ACIS::Citations::Utils; use ACIS::Citations::Suggestions; exportable: - search\_for\_document( id ) - find pre-identified citations - search\_for\_personal\_names( names ) ... #### ACIS::Web::Citations::Profile use ACIS::Citations::Input; use ACIS::Web::SysProfile; #### ACIS::Web::Citations - acis\_citations\_enabled() - check that