Citations in ACIS =================
document source in Markdown format
FULLTEXT INDEX (nstring), INDEX (trgdocid)
is a reference to a hash:
{ ostring => CITSTR,
nstring => NORCITSTR }
Here CITSTR is the original citation string, and NORCITSTR is the normalized citation string.
is a reference to a hash:
{ title => TITLE,
authors => [ AUTHOR1NAME, AUTHOR2NAME, ... ],
type => TYPE,
location => LOCATION
}
Here TITLE is the work's title (a string), AUTHOR1NAME, AUTHOR2NAME and so on are the names of the authors (strings), and TYPE is one of "article", "paper", "book", "software", "chapter", "text" (when no particular type is known).
LOCATION is a string built by joining the following items: the series or journal name, the paper's number in the series (if it is present), the issue/volume/pages. For AMF we can take all the adjectives of the serial adjective container.
foreach ( @names ) {
if ( amatch( $_, $citation ) ) {
$pass = 1;
last;
}
}
If the document is an author pass, it will be ranked according to the
string similarity of the title only, see next point.
2. Compare titles. Take the normalized citation string and take the normalized
title of the research item. Find where in the citation
string the first word of the title is present.
Take as many characters of the citation string as there
are in the title and compare them to the work's title
(with [String::Similarity](http://search.cpan.org/perdoc?String::Similarity)).
(Alternatively, find where in the citation string the last word of the
title is present and take a substring from start to end. Compare in the
same way.)
(Alternatively, do a String::Approx amatch on the citation
string, with the document's title used as the pattern. --
No, this will result in just yes/no value, but we need a
numeric measure.)
Get the comparison result as a number between 0 and 1.
- Also see [Digital Libraries and Autonomous Citation
Indexing](http://citeseer.ist.psu.edu/aci-computer/aci-computer99.html)
paper and its section "Methods" for a brief discussion of
what CiteSeer team tried.
### Citation suggestions
#### Citation to document similarity table: cit\_doc_similarity
Table cit\_doc_similarity fields:
- citation id: Each citation item in the userdata is a hash with these items:
- cnid - ostring - srcdocid - srcdoctitle - srcdocauthors - srcdocurlabout - reason (if manually added) - autoaddreason (if automatically added) - autoadded (if automatically added) ### Storing and passing citations around Citations will be stored as: 1. citation table records 2. in-memory suggestions 3. userdata items (see above) Basically, the citations will go from 1 to 2 and from 2 to 3. At each step, some hash keys may be added to a citation or may be removed. But sometimes a citation is removed from userdata and then it may be re-considered as a suggestion (3->2). ### Modules #### ACIS::Citations #### ACIS::Citations::Utils - normalize_string( string ) - build_citations\_index( citlist, [index] ); - build a hash of [ cnid: citation ] pairs ??? - get_document\_authors(); - for co-authors' claims - cit_document\_similarity( cit, doc ) -- default cit-doc similarity func and other useful stuff #### ACIS::Citations::Input use ACIS::Citations::Utils; - process_citation( cit ) - normalize, cut the editors, calculate the checksum - save_citation( cit ) - check_citation( cit ) - reload citation from the citations table or return undef otherwise #### ACIS::Citations::Suggestions #### ACIS::Citations::SimMatrix use ACIS::Citations::Suggestions; exported: - load_similarity\_matrix( record ); - returns matrix structure as specified above object methods: - most\_interesting\_doc( ); - remove_citation( $matrix, $cit ); ... internal: - _calculate_totals( $matrix ); #### ACIS::Citations::Search use ACIS::Citations::Utils; use ACIS::Citations::Suggestions; exportable: - search\_for\_document( id ) - find pre-identified citations - search\_for\_personal\_names( names ) ... #### ACIS::Web::Citations::Profile use ACIS::Citations::Input; use ACIS::Web::SysProfile; #### ACIS::Web::Citations - acis\_citations\_enabled() - check that