You want to convert ASCII text to HTML.
Use the simple little encoding filter in Example 20.3.
#!/usr/bin/perl -w -p00
# text2html - trivial html encoding of normal text
# -p means apply this script to each record.
# -00 mean that a record is now a paragraph
use HTML::Entities;
$_ = encode_entities($_, "\200-\377");
if (/^\s/) {
# Paragraphs beginning with whitespace are wrapped in <PRE>
s{(.*)$} {<PRE>\n$1</PRE>\n}s; # indented verbatim
} else {
s{^(>.*)} {$1<BR>}gm; # quoted text
s{<URL:(.*?)>} {<A HREF="$1">$1</A>}gs # embedded URL (good)
||
s{(http:\S+)} {<A HREF="$1">$1</A>}gs; # guessed URL (bad)
s{\*(\S+)\*} {<STRONG>$1</STRONG>}g; # this is *bold* here
s{\b_(\S+)\_\b} {<EM>$1</EM>}g; # this is _italics_ here
s{^} {<P>\n}; # add paragraph tag
}Converting arbitrary plain text to HTML has no general solution because there are too many different, conflicting ways of representing formatting information in a plain text file. The more you know about the input, the better the job you can do of formatting it.
For example, if you knew that you would be fed a mail message, you could add this block to format the mail headers:
BEGIN {
print "<TABLE>";
$_ = encode_entities(scalar <>);
s/\n\s+/ /g; # continuation lines
while ( /^(\S+?:)\s*(.*)$/gm ) { # parse heading
print "<TR><TH ALIGN='LEFT'>$1</TH><TD>$2</TD></TR>\n";
}
print "</TABLE><HR>";
}The documentation for the CPAN module HTML::Entities