Perl RegEx: Limiting the pattern to only the first occurrence of a character -
I am trying to extract the contents of a date element with many sick SGML documents. For example, the document might have a simple date element such as
or
& lt; DATE blaAttrib = "89787adjd98d9" & gt; 4 July 1936 & lt; / DATE & gt;
But it can also be in the form of hair:
& lt; DATE blaAttrib = "89787adjd98d9" & gt; 4 July 1936 & lt; EM & gt; EM element within the date & lt; / EM & gt; & Lt; / DATE & gt; The purpose is to get "July 4, 1936" since the files are not large, so I read the entire contents in a variable and chose to regex. The following is a snippet of my Perl code: {local $ / = undef; Open the file, "$ file" or die "File can not be opened: $!"; $ FileContent = & lt; FILE & gt; Close the file; If ($ fileContent = ~ m / & lt; DATE (. *) & Gt; (. *) & Lt; / DATE & gt; /) {# $ 2 should be "4 July 1936", but It did not happen. }}
Unfortunately, regex does not work for hair examples. The reason for this is that & lt; DATE & gt;
inside a & lt; EM & gt;
element and it also spreads to many rows.
Can any kind of spirit give me some signs, instructions, or clues?
Thanksgiving!
.
But by your example, maybe you can try
if ($ fileContent = ~ m / & lt; DATE [^ & gt;] * & gt; ([^ & Lt;] +) /) {# $ 1 Use # You may need to strip new lines}
Comments
Post a Comment