RMG

0839_CSi: The first million word corpus is complete

by ceteratolle posted Mar 17, 2018
?

Shortcut

PrevPrev Article

NextNext Article

ESCClose

Larger Font Smaller Font Up Down Go comment Print
austraLasia 839
 
CSi: The first million word corpus is complete
 
ROME: 2nd May '04 --  CSi stands for Corpus Salesianum (Italian), a collection of texts in digital form from the Salesian 'magisterium' or teaching authority.  The collection has now reached its first million words, representing approximately 180 individual texts from the past 25 years.  The starting point has been the beginning of Fr. Vigano's period as Rector Major and includes all of his letters, those of Fr. Vecchi and Fr. Chavez, and other texts of magisterial nature, such as General Chapter documents (23,24,25) the renewed Ratio, various guidelines indicated in the 'Acts' and so forth.
 
The texts are first converted to plain text, that is, unadored, unformatted '.txt'.  They are then indexed electronically.  At this point it is possible to instantly retrieve any word or phrase in the collection, but, with further analysis (likewise digital) it is then possible to identify statistical relationships, for example the mutual relationship scores (MI) between chosen terms, or the consistency of a term across the entire corpus.
 
It may be of interest to know that the ten most frequent terms, numerically and consistently, of Salesian discourse in the post-Vatican II period as represented by these texts are, in order: life, Don Bosco, young people, God, community, Church, Spirit, Christ, formation, confreres/Salesians.  These are then followed by mission, faith, family, congregation, experience.  Of course, mere numbers are insufficient.  We need to know what collocates with what.  It helps to know that life associates strongly with two concepts in particular, one that is expressed in terms ofconsecration-religious-Salesian-apostolic; the other with life's daily-sociocultural-fraternal-communitarian quality.  Don Bosco also collocates with two main conceptual groupings: charism-Salesian-spirit-preventive system-mission-Salesian Family-heart-pedagogy on the one hand (words to the left of DB) and young people-history-father and teacher-place(Valdocco,Rome..)-mother on the other (words to the right).  The corpus also demonstrates, and this is most helpful to a deeper understanding of certain Salesian magisterial rhythms, the key meaningful word clusters around each word or concept.  If it is God that Salesians speak about, then we know that they most frequently, and again in order, speak of God's love for the young'the people of Godthe Word of God and our relationships with God.
 
CSi will continue to expand.  Of all the corpora possible within Salesian discourse, it is the most important because of its association with the original and official language of the Congregation.  But plans are afoot to develop CSe,and then, hopefully, CSsCSf and CSp.  At this point the corpora of the Congregation's major language groups become of immense value to translators and offer the possible of much greater consistency.  Perhaps it is no accident that CS is also a designation for 'comunicazione sociale', the Social Communications Department out of which this work has grown.
____________________________________________
'austraLasia' is an email service for the Salesian Family of Asia-Pacific. If you wish to add to or be removed from this list please contact jbfox@sdb.org . Back issues of austraLasia are available on www.bosconet.aust.com . Consider also the possibility of contributing to Lexisdb
 
 

Articles

1 2 3 4 5 6 7 8 9 10