Version 10 of EasyTextPrint

Updated 2016-02-11 15:30:53 by HJG

if 0 {


Summary

HJG 2016-02-02: This is another attempt at quick, easy, ad-hoc printing of plain textfiles.

I often have some informations in a textfile, and if I need to print that, I want a nice looking page,
e.g. with a few headers, some text in bold etc., but I don't want to use an 'Office'-textprocessor for that.

The idea is to convert that textfile to a html-file, then print that with the webbrowser.

The basic operation of that converter is to copy the inputfile x.txt to x.html,
add some lines like "<html>" "<head>", "<body>" etc. and wrap the first line of text in <h1>-tags.
Also, replace some chars (like &, <, >) with html-entities.

Then I can drop the resulting file x.html into the browser and print.
Or, with a fixed location for the output-file, use a bookmark in the browser.

Add some CSS to taste, and extent the converters "basic operation" to cover more markup, as need arises.

There are some programs available that work like that, e.g. Markdown.
But Markdown uses Perl, and I want even more minimal markup.


With ideas and code from the following pages:

}


Code

 # EasyTextPrint001.tcl - HaJo Gurt - 2016-02-02
 # http://wiki.tcl.tk/42409

  puts "EasyTextPrint:"
  set i 0
  foreach s { foo bar grill }  { incr i; puts "$i $s" }

...todo...

 ### EOF ###

Code - awk

I went ahead and also did this program using awk, and this script has some more features implemented:

#!/usr/bin/awk -f
# txt2htm.awk - gurt.gmx@de - 2016-02-02
#
#: Read plain text, output as html, marked up for printing via webbrowser

# Usage:
#   gawk -f txt2htm.awk  Tel.txt
#   gawk -f txt2htm.awk City.txt > City.html

# See also: https://css-tricks.com/almanac/properties/p/page-break/

#
#-##+####1####+####2####+####3####+####4####+####5####+####6####+####7####+###
#
  function chr(c) \
  {
    return sprintf( "%c", c+0 );  # make c numeric by adding 0
  }

  BEGIN           { Q1  = "'"; Q2  = "\"";  # Quotes
                    A   = "\\&";
#                 # print "# EasyTextPrint"
                    Part   = 0;
                    LineNr = 0;
                    H      = 0;
                    Title = "Print-Test"
                    Skip=0
                  }

 ###

  /^##__/         { exit }
  /^##!!/         { print "<H6></H6>"; next }
  /^_$/           { print "&nbsp;"; next }

  /^##\++/        { Skip=1 }   ##++ skip
  /^##--/         { Skip=0 }   ##--Start--
   Skip>0         { next }

  /^#/            { next }

  /^==/           { print "<HR>"; Part=2; H=3; next }
  /^--/           { print "<hr>"; Part=2; H=3; next }

  NF<1            { print; next }

                  { gsub( "&", A"amp;"); }
                  { gsub( "<", A"lt;" ); }
                  { gsub( ">", A"gt;" ); }

                  { gsub( "Ä", A"Auml;" ); }
                  { gsub( "Ö", A"Ouml;" ); }
                  { gsub( "Ü", A"Uuml;" ); }
                  { gsub( "ä", A"auml;" ); }
                  { gsub( "ö", A"ouml;" ); }
                  { gsub( "ü", A"uuml;" ); }
                  { gsub( "ß", A"szlig;"); }

                  { gsub( "²", A"sup2;"); }
                  { gsub( "-", A"dash;"); }
                  { gsub( "–", A"dash;"); }

                  { sub( "[*][*]", "<B>");  }
                  { sub( "[*][*]", "</B>"); }

                  { sub( "//", "<I>");  }
                  { sub( "//", "</I>"); }

                  { sub( "__", "<U>");  }
                  { sub( "__", "</U>"); }

                  { sub( "%%", "<center>");  }
                  { sub( "%%", "</center>"); }
  
  /^ /            { print("<pre>" $0 "</pre>" ); next }

  Line==0         { Title = $0; Line++;
                    Part = 1; H++
                    print("<html>")
                    print("<HEAD>")

                    print("<style type=\"text/css\"> ")
                    print("* {")
                    print(" margin:       0;")
                    print(" margin-left: 10px;")
                    print(" padding:      0; }")
                    print("body {")
                    print(" background:  silver;")
                    print(" font-family: verdana, helvetica, sans-serif;")
                    print(" font-size:   12px;")
                    print("}")
                    print("h1,h2,h3,h4,h5,p,ul,li,hr {")
                    print(" padding:         1px;")
                    print(" background:      #eeEEee; }")
                    print(" h1 { background: #80FFFF; } ")
                    print(" h2 { background: #ffFF80; } ")
                    print(" h3 { background: #80FF80; } ")
                    print(" h6 { background: #FF8080; ")
                    print("  page-break-before: always; } ");  # auto / always
                   #print(" ...more style-css...")
                    print("</style>")

                    print("<TITLE>" Title "</TITLE>")
                    print("</HEAD>")

                    print("<BODY>")
                   #print("<h1>" Title "</h1>")
                    print("<h" H ">" Title "</h" H ">");

                   #print("<code>")
                   #print("<pre>")
                    next
                  }

  Part==1         { Hdr = $0; Line++; H++
                    print("<h" H ">" Hdr "</h" H ">");  # H2
                    next
                  }

  Part==2         { Hdr = $0; Line++;
                    Part = 3;
                    print("<h" H ">" Hdr "</h" H ">");  # H3
                    next
                  }

  /^\*/           { T = $0;
                    T = substr( $0,2 );
                    print("<UL><LI>" T "</LI></UL>" ); next
                  }

                  { print("<p>" $0 "</p>" ); next }
 #                { print }

  END             { # print "# Done."
                    print("</BODY>")
                    print("</html>")
                  }
#.

Input

This is an example of a plain textfile used as input.

It will show pretty much all features implemented for now, along with some of the more common special chars.

With the 'slimlined' CSS above, the result should be 2 printed pages (A4).

# Comment - this is the file: City.txt
#
Großstädte in Deutschland
Kommunalverband besonderer Art 
 
==

##++ skip

# Test1:
Test
Textstyle: **bold** //italic// __underline__
Umlaute: < ÄÖÜ & äöüß >
--
H3
Text-Paragraph
Text=P
 Text-Pre
 Text=Pre
* Text-UL
* Text=UL
--
Jäger, Müller & Förster GmbH & Co. KG 
Erzhäuser Straße. 90, 88662 Überlingen
Tel. 07773 85 86 87
--

##--Start--
Aachen
Nordrhein-Westfalen
Einwohner:         243.336
Postleitzahlen:         52056–52080
Vorwahlen:         0241, 02403, 02405, 02407, 02408
Kfz-Kennzeichen:         AC, MON
52062 Aachen
--
Bergisch Gladbach
Nordrhein-Westfalen
Einwohner:         109.697 
Postleitzahlen:         51427–51469
Vorwahlen:         02202, 02204, 02207
Kfz-Kennzeichen:         GL
51465 Bergisch Gladbach
--
Göttingen
Niedersachsen
Einwohner:         117.665 
Postleitzahlen:         37001–37099
Vorwahl:         0551
Kfz-Kennzeichen:         GÖ
37083 Göttingen
--
Hannover
Niedersachsen
Höhe:         55 m ü. NHN
Fläche:         204,14 km²
Einwohner:         523.642 
Postleitzahlen:         30159–30659
Vorwahl:         0511
Kfz-Kennzeichen:         H
30159 Hannover
--
Moers
Nordrhein-Westfalen
Einwohner:         102.923 
Postleitzahlen:         47441–47447
Vorwahl:         02841
Kfz-Kennzeichen:         WES, DIN, MO
47441 Moers
--

##!! page-break

Neuss
Nordrhein-Westfalen
Einwohner:         152.644 
Postleitzahlen:         41460–41472
Vorwahlen:         02131, 02137, 02182
Kfz-Kennzeichen:         NE, GV
41460 Neuss
--
Paderborn
Nordrhein-Westfalen
Einwohner:         145.176 
Postleitzahlen:         33098–33109
Vorwahlen:         05251, 05252, 05254, 05293
Kfz-Kennzeichen:         PB, BÜR
33098 Paderborn
--
Recklinghausen
Nordrhein-Westfalen
Einwohner:         114.147 
Postleitzahlen:         45601–45665
Vorwahl:         02361
Kfz-Kennzeichen:         RE, CAS, GLA
45657 Recklinghausen
-- 
Reutlingen
Baden-Württemberg
Regierungsbezirk:         Tübingen
Landkreis:         Reutlingen
Einwohner:         112.452 
Postleitzahlen:         72760–72770
Vorwahlen:         07121, 07072 und 07127
Kfz-Kennzeichen:         RT
72764 Reutlingen
--
Saarbrücken
Saarland
Einwohner:         180.047
Postleitzahlen:         66001–66133
Vorwahlen:         0681, 06893, 06897, 06898, 06805, 06806, 06881
Kfz-Kennzeichen:         SB
66111 Saarbrücken
--
Siegen
Nordrhein-Westfalen
Kreis:         Siegen-Wittgenstein
Einwohner:         100.325 
Postleitzahlen:         57072–57080
Vorwahlen:         0271, 02732 (Meiswinkel), 02737 (Feuersbach)
Kfz-Kennzeichen:         SI, BLB
57072 Siegen
--
_
 Hi    Hi
 Hi    Hi
 Hi Hi Hi
 Hi    Hi
 Hi    Hi

%%End%%

##__EOF__

dont print this
bla
blah

Comments

Markup

  • # : Comments: lines starting with a '#' don't get printed.
  • ## : Commands: some special comments are used as commands:
    • ##__ : End-of-file. Stop printing, end program.
    • ##!! : Pagebreak
    • ##++ : Start-marker: skip the following lines, until the endmarker "##--" is found.
    • ##-- : Endmarker: resume printing
  • The first non-comment line of the textfile will be used as title and H1-header.
  • The lines following will be used for headers formatted as H2, H3, H4, H5, until the "End-of-headers"-marker.
  • == : End of headers. The next line of text will be shown as "normal" text.
    Normal text gets wrapped in <p>-tags.
  • -- : Section break = Horizontal line. The next line of text will be shown as a H3-header.
    The following lines will be shown as "normal" text.
  • Textstyles: **bold** //italic// __underline__ %%centered%%
  • Lines starting with a blank: line gets wrapped in <pre>-tags ==> preformated
  • Lines starting with "*" : line gets wrapped in <UL><LI>-tags ==> unnumbered list
  • A line with a single "_" : gets replaced with &nbsp; ==> blank line

Features

  • Comments, <pre>, <UL>, H1..H5, EOF, and skip-ranges are extensions to the "basic operation" of the converter.
  • Blank lines are not used for headers. The inputfile can have a spaced out formatting.
  • The most common special chars are replaced with html-entities (ÄÖÜ, dashes, etc.)
  • Textsize, line-height, margins, padding are set to minimal values, to fit as much text on a page as possible.
  • Light background-colors, to show the structure of the text. And to make it easy to spot errors...
  • Pagebreak is a CSS-feature that only works when printing.
  • Print-Preview in the browser allows to customize headers and footers, e.g. filename, pagenumbers, etc.

Quirks

  • No check on the number of headers. Make sure to have a "==" marker after your headers !
  • Bold/italic/underline: currently, only the first occurrence of these gets rendered.
  • Center: uses the obsolete tag "<center>".
    Also, I wanted the markup as "^^center^^", but ^ is a very special char...
  • unnumbered-lists: only first level is supported for now.
  • no ordered-lists: I have no plans for them, and I wanted "#" as comment-char.
  • H6: was reused as page-break, so it cannot be used as a normal header.

See also: