Word, Pattern and Phrase Searching

Type the word, pattern or phrase you wish to search in the input box; for examples see the respective sections below. If your search term includes accented characters, see the note on how to enter accents from your keyboard. Use the pull-down menu to the right of the input box to restrict your search to a particular document or the whole corpus (default).

 

Word Searching

The search engine is principally designed to retrieve all the occurrences of words that match a string of characters. A series of radio buttons allows you to specify whether you want the search engine to match the character string from the beginning of the word, from the end of the word, anywhere in the word or the whole word. Note that punctuation marks and articles followed by apostrophes are ignored by the search engine in word searches.

Examples:

Try searching for rendu in the whole corpus (default). If you do so with the radio button for matching the search string at the beginning of the word checked (default), you will retrieve 63 occurrences of words that begin with the letters rendu, including renduz, rendus, rendue etc. Note that you can use the case sensitive option to more effectively search for proper names (i.e., words that begin with a capital letter).

Now try entering rendu in the input box and clicking the radio button for matching the search string at the end of the word. This time you will retrieve 42 occurrences of words ending in rendu, including rendu and prendu.

Now repeat the same search, but click the radio button for matching the search string anywhere in the word. This time you will retrieve 105 occurrences of words that contain the string rendu either at the beginning or end of the word or somewhere in the middle, as in reprendue.

Finally, repeat the search one more time, this time clicking the radio button labeled "whole word." This will retrieve all 24 occurrences of the word rendu in that exact form. Note that you can use the case sensitive option to more effectively search for proper names (i.e., words that begin with a capital letter).

You can also specify that search string be matched only when it occurs at the end of the line by clicking the corresponding radio button. This is very useful for examining rhyming patterns. Try the entering the rendu search string again, this time clicking the radio button for matching the search string at the end of the line. This will retrieve 31 occurrences with rendu or prendu at end of the line.

 

Pattern Searching

The search engine implements UNIX regular expression pattern matching, a powerful language that allows you to perform a variety of "wildcard" type searches with great precision. Basically, pattern matching expressions are used to substitute for an unknown series of characters in the search string. The pattern matching expressions that can be used in searching the database corpus are explained in the table below.

. (period) matches any single word character (i.e., the letters a-z, whether lower or uppercase, unaccented or accented; but not spaces or punctuation marks) [e.g., .nfant can be used to match both enfant and infant]
* (asterisk) matches zero or more occurrences of a pattern (can be used after the . expression, any single character, or after expressions contained in brackets or parentheses (see below) [e.g., Carlemaig*ne can be used to match both Carlemaigne and Carlemaine]
+ (plus sign) matches one or more occurrences of a pattern (can be used after the . expression, any single character, or after expressions contained in brackets or parentheses (see below) [e.g., Atil+ can be used to match Atile, Atille etc.]
[abc] (choice in brackets) matches occurrences of characters contained in the brackets; including a circumflex (^) as the first character after the left bracket matches all characters except those included the brackets [e.g., [^at]tile ].
[a-z] (range in brackets) matches a single character found in the specified range; including a circumflex (^) as the first character after the left bracket matches all characters except those included in the range [e.g., [^at]tile ]
(xxx|yyy) (alternative patterns) the 'or' operator | used within parentheses permits the specification of alternative patterns. This is useful for searching for simultaneously searching accented and unaccented forms of the same character (see note below on accented characters), or even alternative sub-patterns [ e.g., a(t+il|til+)e ]

 

Phrase Searching

While the search engine does not implement Boolean logic (AND OR NOT), it is still possible to search for co-occurrences of two terms on the same line or multi-word phrases. To perform such a search, type the search string into the input box and click the radio to match the search string "anywhere in line." This will tell the search engine to ignore word boundaries when attempting to match the search string.

Phrase searching is most effectively implement using the regular expression pattern matching language described in the table above.

Example:

In order to search for occurrences of the word fils/filz in conjunction with the word roi (and in that order) you can use the pattern fil[sz]( |.)+roi[s ] [= match 'fils' or 'filz', followed by one or more occurrences of either a space or any word character, followed by 'roi', followed by either an 's' (for variant form) or space (to insure that only the whole word 'roi' or 'rois' is matched), specifying the match search string option "anywhere in the line" which will return 16 matches, including phrases such as filz roi, filz de roi, fils le rois and fils fu le roi.

 

Case Sensitivity Option

The search engine is case insensitive by default. To conduct case-sensitive searches, simply click the appropriate radio button prior to submitting your query. This option is most useful for searching for proper names; be aware that if you use it to search for words that begin with lowercase letters, that if the desired word occurs at the beginning of a line, and is thus capitalized for that reason, it will not be matched in that instance.

Example:

Searching for l[ae]ng at the beginning of the word (default) in the whole corpus (default) with the default setting for case insensitivity retrieves 93 words containing the search string, including lengaje, lengue, Lengres. If you are interested in restricting results to Lengre(s) / Langres, enter the search string L[ae]ng (note capital L) and click the radio button labeled case "sensitive"; this will result in 20 matches.

 

Entering Accented Characters in the Search Form

If your keyboard will allow it, you may enter accented characters directly (European keyboards include keys with accents; Macintosh users can create accented characters by depressing the option key in conjunction with the accent indicator key followed by the vowel -- for example: option-e e yields é).

If you cannot enter accented characters directly from your keyboard, then type the vowel followed by the appropriate accent indicator from the table below, as follows:

grave = back slash. Example: à --> a\
acute = forward slash. Example: é --> e/
circumflex = caret. Example: ê --> e^
cedilla = comma. Example: ç --> c,
umlaut = double quote. Example ö --> o"
tilde = tilde. Example ñ --> n~

To search for both accented and unaccented forms of a letter, enclose the alternative forms within parentheses, separating with the vertical bar character (shift-\ on most keyboards). For example, to search for all occurrences of cent or çent at the beginning of words (default) in the whole corpus (default), enter (c|ç)ent or (c|c,)ent in the input box. This will retrieve 219 occurrences of words that begin with either cent or çent. See the section on pattern matching above for further explanation of pattern matching techniques.