Version 15 of EasyTextPrint

Updated 2016-02-15 03:21:55 by HJG

if 0 {


Summary

HJG 2016-02-02: This is another attempt at quick, easy, ad-hoc printing of plain textfiles.

I often have some informations in a textfile, and if I need to print that, I want a nice looking page,
e.g. with a few headers, some text in bold etc., but I don't want to use an 'Office'-textprocessor for that.

The idea is to convert that textfile to a html-file, then print that with the webbrowser.

The basic operation of that converter is to copy the inputfile x.txt to x.html,
add some lines like "<html>" "<head>", "<body>" etc. and wrap the first line of text in <h1>-tags.
Also, replace some chars (like &, <, >) with html-entities.

Then I can drop the resulting file x.html into the browser and print.
Or, with a fixed location for the output-file, use a bookmark in the browser.

Add some CSS to taste, and extent the converters "basic operation" to cover more markup, as need arises.

There are some programs available that work like that, e.g. Markdown.
But Markdown uses Perl, and I want even more minimal markup.


With ideas and code from the following pages:

}


Code

 # EasyTextPrint001.tcl - HaJo Gurt - 2016-02-02
 # http://wiki.tcl.tk/42409

  puts "EasyTextPrint:"
  set i 0
  foreach s { foo bar grill }  { incr i; puts "$i $s" }

...todo...

 ### EOF ###

Code - awk

I went ahead and also did this program using awk, and this script has some more features implemented:

#!/usr/bin/awk -f
# txt2htm.awk - gurt.gmx@de - 2016-02-15
#
#: Read plain text, output as html, marked up for printing via webbrowser

#: Markup - String at start of line determines type of header in next line:
#  ^^ H1-header in next line (implicit just before first line of inputfile)
#  == H2-header in next non-comment, non-blank line
#  -- H3-header in next non-comment, non-blank line

# Usage:
#   gawk -f txt2htm.awk  Tel.txt
#   gawk -f txt2htm.awk City.txt > City.html

# See also: https://css-tricks.com/almanac/properties/p/page-break/

#
#-##+####1####+####2####+####3####+####4####+####5####+####6####+####7####+###
#
  function chr(c) \
  {
    return sprintf( "%c", c+0 );  # make c numeric by adding 0
  }

  BEGIN           { Q1  = "'"; Q2  = "\"";  # Quotes
                    A   = "\\&";
                    LineNr = 0;
                    Skip   = 0
                    Title  = "EasyTextPrint"
                    Cmd    = "Hdr";
                    H      = 1;
                    Prev   = "H";
                  }

  function Head(T) \
  {
                    print("<!DOCTYPE HTML>")
                    print("<html>")
                    print("<HEAD>")
                    print("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />")

                    print("<style type=\"text/css\"> ")
                    print("* {")
                    print(" margin:       0;")
                    print(" margin-left: 10px;")
                    print(" padding:      0; }")
                    print("body {")
                    print(" background:  silver;")
                    print(" font-family: verdana, helvetica, sans-serif;")
                    print(" font-size:   12px;")
                    print("}")
                    print("h1,h2,h3,h4,h5,p,ul,li,hr {")
                    print(" padding:        1px;")
                    print(" background:     #eeEEee; } ")
                    print("h1 { background: #ffFF80; text-align: center; } ")
                    print("h2 { background: #80FFFF; text-decoration: underline; } ")
                    print("h3 { background: #80FF80; } ")
                    print("h4 { background: #FF8080; } ")
                   #print(" ...more style-css...")
                    print("</style>")

                    print("<TITLE>" T "</TITLE>")
                    print("</HEAD>\n");
    return
  }

  /^##__/         { exit }
  /^##!!/         { print "<DIV style=\"page-break-after:always\"></DIV>"; Cmd="FF"; next }

  /^_$/           { print "&nbsp;"; next }

  /^##\++/        { Skip=1 }   ##++ skip
  /^##--/         { Skip=0 }   ##--Start--
   Skip>0         { next }

  /^#/            { next }

  /^\^+/          { Cmd = "Hdr"; H=1; next }
  /^==/           { Cmd = "Hdr"; H=2; next }
  /^--/           { Cmd = "Hdr"; H=3; next }

  NF<1            { print; next }

                  { gsub( "&", A"amp;"); }
                  { gsub( "<", A"lt;" ); }
                  { gsub( ">", A"gt;" ); }

                  { gsub( "Ä", A"Auml;" ); }
                  { gsub( "Ö", A"Ouml;" ); }
                  { gsub( "Ü", A"Uuml;" ); }
                  { gsub( "ä", A"auml;" ); }
                  { gsub( "ö", A"ouml;" ); }
                  { gsub( "ü", A"uuml;" ); }
                  { gsub( "ß", A"szlig;"); }

                  { gsub( "²", A"sup2;");  }        
                  { gsub( "-", A"ndash;"); }    #
                  { gsub( "–", A"dash;");  }

                  { sub( "[*][*]", "<B>");  }
                  { sub( "[*][*]", "</B>"); }

                  { sub( "//", "<I>");  }
                  { sub( "//", "</I>"); }

                  { sub( "__", "<U>");  }
                  { sub( "__", "</U>"); }

                  { sub( "%%", "<center>");  }   # ^^
                  { sub( "%%", "</center>"); }

  /^ /            { print("<pre>" $0 "</pre>" ); next }


  LineNr==0       { Title = $0;
                    LineNr++;
                    Head(Title);
                    print("<BODY>");
                   #next
                  }

  Cmd=="Hdr"      { Hdr = $0; LineNr++; Cmd="";
                    if (Prev!="H") { print("<hr>\n"); }        
                    print("<h" H ">" Hdr "</h" H ">");      # H1..H3
                    Prev="H";
                    next
                  }

  /^\*/           { T = $0;
                    T = substr( $0,2 );
                    print("<UL><LI>" T "</LI></UL>" ); Prev="u"; next
                  }

                  { print("<p>" $0 "</p>" ); Prev="p"; next }
#                 { print }

  END             { # print "# Done."
                    print("</BODY>")
                    print("</html>")
                  }
#.

Input

This is an example of a plain textfile used as input.

It will show pretty much all features implemented for now, along with some of the more common special chars.

With the 'slimlined' CSS above, the result should be 2 printed pages (A4).

# Comment - this is the file: City.txt
# 2* H1:
Großstädte in Deutschland
^^
Kommunalverband besonderer Art 
 
==

##++ skip

# Test1:
==
Test-H2
Textstyle: **bold** //italic// __underline__
Umlaute: < ÄÖÜ & äöüß >
--
Test-H3
Text-Paragraph
Text=P
 Text-Pre
 Text=Pre
* Text-UL
* Text=UL
--
Jäger, Müller & Förster GmbH & Co. KG 
Erzhäuser Straße. 90, 88662 Überlingen
Tel. 07773 85 86 87
--
Lorem ipsum 
ubique nostro singulis in vix, vis eu doctus scripserit ullamcorper. His quidam detraxit referrentur ei, affert adolescens intellegam sea in. Eros phaedrum imperdiet vim ei, ex amet voluptatum efficiendi eos, nihil sanctus intellegebat at nec. Adipisci theophrastus ei duo, eos cu conceptam percipitur, an dicta eripuit similique his. Graeci convenire in sit, eum errem laoreet ancillae ut, qui at facilisi periculis. Nec scripta denique percipit in, at inani probatus est. 


##--Start--
==
Niedersachsen
--
Göttingen
Niedersachsen
Einwohner:         117.665 
Postleitzahlen:         37001–37099
Vorwahl:         0551
Kfz-Kennzeichen:         GÖ
37083 Göttingen
--
Hannover
Niedersachsen
Höhe:         55 m ü. NHN
Fläche:         204,14 km²
Einwohner:         523.642 
Postleitzahlen:         30159–30659
Vorwahl:         0511
Kfz-Kennzeichen:         H
30159 Hannover
--

==
Baden-Württemberg
--
Reutlingen
Baden-Württemberg
Regierungsbezirk:         Tübingen
Landkreis:         Reutlingen
Einwohner:         112.452 
Postleitzahlen:         72760–72770
Vorwahlen:         07121, 07072 und 07127
Kfz-Kennzeichen:         RT
72764 Reutlingen
--
==
Saarland
--
Saarbrücken
Saarland
Einwohner:         180.047
Postleitzahlen:         66001–66133
Vorwahlen:         0681, 06893, 06897, 06898, 06805, 06806, 06881
Kfz-Kennzeichen:         SB
66111 Saarbrücken
--

##!! page-break

==
Nordrhein-Westfalen
--
Aachen
Nordrhein-Westfalen
Einwohner:         243.336
Postleitzahlen:         52056–52080
Vorwahlen:         0241, 02403, 02405, 02407, 02408
Kfz-Kennzeichen:         AC, MON
52062 Aachen
--
Bergisch Gladbach
Nordrhein-Westfalen
Einwohner:         109.697 
Postleitzahlen:         51427–51469
Vorwahlen:         02202, 02204, 02207
Kfz-Kennzeichen:         GL
51465 Bergisch Gladbach
--
Moers
Nordrhein-Westfalen
Einwohner:         102.923 
Postleitzahlen:         47441–47447
Vorwahl:         02841
Kfz-Kennzeichen:         WES, DIN, MO
47441 Moers
--
Neuss
Nordrhein-Westfalen
Einwohner:         152.644 
Postleitzahlen:         41460–41472
Vorwahlen:         02131, 02137, 02182
Kfz-Kennzeichen:         NE, GV
41460 Neuss
--
Paderborn
Nordrhein-Westfalen
Einwohner:         145.176 
Postleitzahlen:         33098–33109
Vorwahlen:         05251, 05252, 05254, 05293
Kfz-Kennzeichen:         PB, BÜR
33098 Paderborn
--
Recklinghausen
Nordrhein-Westfalen
Einwohner:         114.147 
Postleitzahlen:         45601–45665
Vorwahl:         02361
Kfz-Kennzeichen:         RE, CAS, GLA
45657 Recklinghausen
-- 
Siegen
Nordrhein-Westfalen
Kreis:         Siegen-Wittgenstein
Einwohner:         100.325 
Postleitzahlen:         57072–57080
Vorwahlen:         0271, 02732 (Meiswinkel), 02737 (Feuersbach)
Kfz-Kennzeichen:         SI, BLB
57072 Siegen
--

_
 Hi    Hi
 Hi    Hi
 Hi Hi Hi
 Hi    Hi
 Hi    Hi
_

%%End%%

##__EOF__

dont print this
bla
blah

Comments

HJG 2016-02-13: Change of plan: there is no need to use H6 as pagebreak, and I want to use all the headers H1,H2,H3 directly.
The demo-inputfile has been modified.

Markup

  • # : Comments: lines starting with a '#' don't get printed.
  • ## : Commands: some special comments are used as commands:
    • ##__ : End-of-file. Stop printing, end the program.
    • ##!! : Pagebreak. Continue printing on a new page.
    • ##++ : Start-marker: pause printing, and skip the following lines, until the endmarker '##--' is found.
    • ##-- : Endmarker: resume printing.
  • The first non-comment line of the textfile will be used as title and H1-header.
  • ^^ : The text in the following (non-comment, non-blank)line will be used as a H1-header.
  • == : Dito, H2-header to follow.
  • -- : Dito, H3-header to follow.
    The next lines will be formatted as 'normal' text.
    Normal text gets wrapped in <p>-tags.
  • Textstyles: **bold** //italic// __underline__ %%centered%%
  • Lines starting with a blank: the line gets wrapped in <pre>-tags ==> preformated text
  • Lines starting with a '*' : the line gets wrapped in <UL><LI>-tags ==> unnumbered list
  • A line with a single '_' : it gets replaced with a &nbsp; ==> blank line

Features

  • Comments, <pre>, <UL>, H1..H3, EOF, and skip-ranges are extensions to the "basic operation" of the converter.
  • Blank lines are not used for headers. The formatting of the inputfile can be as spaced-out as you want.
  • The special chars I use most commonly are replaced with html-entities (ÄÖÜ, dashes, etc.) - Easy to extend.
  • Textsize, line-height, margins, padding are set to minimal values, to fit as much text on a page as possible.
    To see how much space a normal print would need, use the browsers's "Inspect element", and uncheck 'margins' in the Rules-tab.
  • Light background-colors, to show the structure of the text. And to make it easy to spot errors...
  • Pagebreak is a CSS-feature that only works when printing.
    To see the position of the break, change the empty DIV, e.g. to '<DIV style="page-break-after:always">-</DIV>'.
  • Print-Preview in the browser allows to customize headers and footers, e.g. filename, pagenumbers, etc.

Quirks

  • No ordered-lists: I rarely use these, so I have no plans to implement them here, and I wanted '#' as comment-char.
  • No links, no images, no forms. Well, this is for printing fairly short notes etc., not for browsing.
  • Center: uses the obsolete tag '<center>'.
    Also, I wanted the markup as '^^center^^', but ^ is a very special char - This might get fixed.
  • Bold/italic/underline: currently, only the first occurrence of text thus marked gets rendered - Todo.
  • Unnumbered-lists: only first level is supported for now - Todo.
  • No tables - Todo.

See also: