Free Software for DOS
Text Utilities – 3
Sort, Compare / Difference, Convert, PDF

9 Dec 2005

Global Menu:
Go back to Front Page Menus

Go to top of Text Utils – 1
Go to top of Text Utils – 2
Go to top of Text Utils – 4
Go to top of Text Utils – 5



This page:
FILE SORTING

FILE COMPARE / DIFFERENCE

POSTSCRIPT AND PDF

CONVERT UNIX < > DOS FORMATS

CONVERT OTHER FORMATS

Page 1:
SEARCH AND REPLACE

sed – stream editor

SEARCH ONLY

grep – global regular expression print

LINE KILL / REPLACE

Page 2:
ASCII TEXT SPELLCHECKERS

WORD LISTS, DICTIONARIES, ENCYCLOPEDIAS

WORD COUNT & TEXT ANALYSIS

ASCII CHARTS

Page 4:
GENERAL TEXT FORMAT & FILTER

CHARACTER TRANSLATION & STRIPPING

DUPLICATE-LINE FILTERS

TEXT JUSTIFY

Page 5:
GENERAL TEXT VIEWERS

TSR (POPUP) TEXT VIEWERS

TEXT VIEWERS FOR PROGRAMMERS

SMALL / TINY TEXT VIEWERS

UNIX man AND info FILE VIEWERS

CONVERT TEXT TO EXE

FILE SORTING

Also see: 32-bit SORT included with the GNU Textutils.


RPSORT — Sorts large files extremely fast.

* * * * *

[added 1998-03-21, updated 2004-06-28]

A super-fast sort program which handles large files. "RPSORT supports numerous sort key types including regular text keys, C language strings, Turbo Pascal strings, signed and unsigned binary integers of any length and several types of binary floating point numbers."

From a reader:
I tested many of the sort programs in the SimtelNet repository on text files. Most are limited somehow (like DOS sort), or choke, or take a long time to sort, or plainly produce a wrong output (missing or extra records, etc.). The final two survivors were msort and rpsort. I tested both on very long text files (tens of megabytes: the collated complete works of Shakespeare, Project Gutenberg). Msort took several tens of minutes, rpsort did the same in *seconds* (I thought it hadn't run at all.) Given that, there was nothing else to say about DOS sort programs, in my opinion.

Author: Robert Pirko (1992). Suggested by João Magalhaes.

Download rpsrt102.zip (88K).


PCSORT — Full screen text sort program, supports block, word, and multi-line sorting.

unrated

[updated 1998-03-02]

PCSORT (9K) runs as a full screen, interactive program by default but can also function in the role of command line filter. Although source file size is limited by available conventional memory, PCSORT offers an easy-to-use interface and can sort multiline records (up to 9 lines) and blocks simultaneously. Results can be viewed before being written to disk.

                /Sn   n=size of record in lines (1-9)
                /Pn   n=sort priority (1-9)
                 /R   Sort current priority in reverse order
                 /N   Numeric sort current priority
                 /C   Case sensitive sort
              /L[n]   Line sort:
                      n=record sort line (1-9)
/[B][+] nn [xx [y]]   Block or column sort:
                      nn=start column
                      xx=width
                      y=sort line (1-9)
         /W [+|-] n   Word sort:
                      n=word count
                      minus = count from end of record

Screen menu commands: F1 Displays all sort fields; Alt-F1 Resets all the sort variables to their defaults; F2; Save file; F3 New file; F4 Sort text; F5 Increase lines per record (1-9); Shift F5 Decrease lines per record; F6 Select next key priority (1-9); Shift F6 Select previous key priority; F7 Sort order (de/ascending); F8 Alphanumeric or Numeric sort; F9 Select next Field type: Line, block, word or none; Shift F9 Select previous Field type; F10 Mark the record line for line sort or mark block sort field or select sort word count; Shift F10 Reverse selection of word count.

The v. 1.1 update of PCSORT was originally published in 1991 but apparently is not widely distributed on the Net. The pcsort11.zip archive contains the asm source code, the doc file and the com program for PCSORT as updated 4/18/91 to fix a problem with form feeds at ends of data files. Also contains PCSORT article published in PC Mag: see the included *.xyw (XyWrite) docs.

Author: Michael J. Mefford, for PC Magazine (1991). Suggested by Robert Bull.

Download pcsort11.zip (40K).


RALPH — Sort lines of text in reverse alphabetical order.

* * * *

[added 2005-07-17]

RALPH sorts lines of text from right to left, i.e, lines are read backwards. If input has multi-word lines, then output will be sorted by line-final words, etc.

ear
earache
earaches
eardrop
eardrops
eardrum
eardrums
eared
earflap
earflaps
earful
earfuls
elephant
elephantiases
elephantiasis
elephantine
elephants
imprecated
imprecates
imprecating
imprecation
imprecations
raindrop
raindrops
raining
>
        eared
   imprecated
      earache
  elephantine
      raining
  imprecating
       earful
      eardrum
  imprecation
      earflap
     raindrop
      eardrop
          ear
     earaches
elephantiases
   imprecates
elephantiasis
      earfuls
     eardrums
 imprecations
     earflaps
    raindrops
     eardrops
    elephants
     elephant

abajar
abajo
desganar
desganchar
desgano
desgarbado
desgarbilada
desgarbilado
desgarbo
desgargantar
desgargantarse
desgargolar
desgaritar
desgarrada
desgarradamente
desgarrado
>
   desgarbilada
     desgarrada
 desgargantarse
desgarradamente
       desgarbo
     desgarbado
   desgarbilado
     desgarrado
          abajo
        desgano
     desganchar
         abajar
    desgargolar
       desganar
     desgaritar
   desgargantar
Syntax:  ralph [-a] [-p padding] [infile] > [outfile]
  -a           Extract analysis failures from an AMPLE log file.
  -l linesize  Set the maximum line length (default is no limit).
  -p padding   Specify the minimum padding for each line (default is 0).

If no infile is specified, ralph reads from the standard input.
If no outfile is specified, ralph writes to the standard output.

Author: SIL International (1998).

1989-01-24: v1.1 for DOS16. Runs on DOS 2.0+. Handles files up to ~128K. Bug: Removes top bit from upper ASCII characters – fixed in v1.1b. Package also contains scripts with similar function, for awk and other Unix programs.

1998-09-01: v1.1b for DOS32. DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other).

1998-09-01: v1.1b for Win32 console.

Downloads
DOS16
ralph.zip
(13K)
DOS32
ralph11b.zip
(25K)
Win32
ralph32-11b.zip
(18K)
Doc
ralphdoc11b.zip
(364B)

Get more programs for linguists from the SIL Software Catalog.


FILE COMPARE / DIFFERENCE

Text file compare programs are frequently used by programmers for version maintenance – but they can also be used by us common folk to compare two versions of an ascii document (e.g., different versions of those autoexec.* files that accumulate in your root directory, file lists, etc.). The programs below may use different "difference" algorithms – each with unique strengths / limitations. Not usually a major issue for simple uses, but you may wish to try them all to determine which suits your needs and style best. The programs listed below may not be the best picks for programming needs.


Double Lister — Dual window text comparer.

unrated

added 12-15-98, updated 2004-08-20]

From the docs:
...displays two files simultaneously in separate windows. These can be scrolled individually, or locked together, useful to locate differences between similar files. Options: Search for text string, split windows horizontally or vertically, change window size and tab spacing, display line ends, 7-bit mode, hex mode with offsets and alignment.

Author: Steven S. Bates (1989). Suggested by Robert Bull.

Download dl103.zip.


Visual Compare — Feature-rich file comparer.

* * * * *

A favorite for general use because of the flexible display options. Interactive and command line modes; possesses an internal viewer with scrolling capability; by default it colorizes new/old/changed text which makes for easy comprehension of differences. Split window (horiz. or vertical), dual file display option. Flexible output options. Understands UNIX formatted text.

"The maximum allowed line length in file one and file two is 2048 characters. The maximum number of lines that file one and file two each can contain is 16368. The maximum number of lines that the composite file can contain is 16368."

Difference algorithm used: "linear space refinement of the basic O(ND) difference algorithm."

Command line usage: VCOMP fileone filetwo [options]
Options:
/B...Monochrome display.
/Tn...Tab width. Range is 2-64. Default is 8.
/25...Display 25 lines if you have either an EGA or a VGA.
/43...Display 43 lines if you have an EGA.
/50...Display 50 lines if you have a VGA.
/S[-].Write edit script to standard output.
/C...Write composite file to standard output.
/D...Write difference file to standard output.
/En...Maximum edit distance. Range is 0-32736. Default is 32736.
/I...Ignore leading space and tab characters.
/K...Consider upper-case and lower-case letters equivalent.
/Z...Consider all characters significant.

Author: John R. Whitney (1993).

Download vc154.zip (38.4K).


@COMPARE — Text file comparer for very large files.

* * *

[updated 2005-12-08]

(aka "ATCOMPARE", "ACOMPARE"). Comprehensible ouput to screen is color coded – but you can't scroll back through output as in VCOMP. Easily digestible report-to-text file output with side-by-side comparisons (unfortunately broken word fragments can result from the program's wrapping of text when generating side-by-side comparisons).

Limitations:
Usage: @Compare [options] [<filename1> [<filename2>]]
where [options] begins with / or -, and is
a combination of the following:
P -directs output to the printer
F -directs output to a file
M -suppresses colors for monochrome monitors
T -suppresses the title header
H -suppresses highlighting in unequal lines
A -replaces graphics characters with standard Ascii codes
R -prints a report of discrepancies by field to a file
C -disables breaks after every screenful of output
L -allows for long and
E -extra long searches; not usually necessary
B -suppress direct video writes; use BIOS instead
Q -quits

Author: Brian C. Madsen (1994-98). Suggested by Marianna Van Erp.

1999-05-13: v1.8. Bug fix for fast Pentiums.

Download atcomp18.zip (23K).


jDif — Fast file difference utility.

unrated

[updated 2005-12-08]

Don't have much experience with this one. Fast, color-coded output to screen, or send results to report file.

Syntax: jdif oldfile newfile [options]

/a...370 Assembler (columns 1 to 72)
/c...COBOL (columns 7 to 72)
/f...DOS FC style output
/h...Help (this is it)
/r...output Report
/v...do not buffer output
Limitations:

"Private persons are hereby licensed to use the software at home for non-commercial purposes at no charge."

Author: Jonathan Rosenne / QSM Programming Ltd., Israel (1996). Suggested by Marianna Van Erp.

1996-05-18: v1.0.

Download jdif01.zip (36K).

jDif page/


FINTRSCT — Compare 2 files; outputs shared / unique lines to 3 report files.

unrated

[added 1998-07-05, updated 2001-07-18]

File Intersection takes a different approach to the task of file comparing. FINTRSCT compares two (smaller) files and outputs three files: one file listing lines unique to file 1; a second file containing lines unique to file 2; and a third file containing paired shared lines. Lines are numbered to allow easy location in original files. The order of lines in the input files is not relevant, and comparisons are case insensitive. Useful for comparing different versions win.ini, autoexec.bat, etc..also useful for comparing updated file lists (e.g., easily determine "what's new"). Two version included: 16-bit DOS and 32-bit Windows.

Remarks: This tool acts similar to line uniqifiers – but unlike the latter doesn't require the manual merge of the two text files and post-merge sorting . The DOS version handles smaller files. I tested two 200K files (about 20,000 one-word lines each) and the program locked my machine. I then tested two 100K files (about 10,000 one-word lines, with one shared line) and it worked, but took about two minutes to process on a P-60. Win32 version untested.

USAGE: fintrsct file1 file2

Creates :
unique1 - lines unique to file1
unique2 - lines unique to file2
common  - lines common to file1 and file2

Author: Paul Trout (1996). Suggested by Marianna Van Erp.

1996-12-07: Unnumbered release.

Download fintrsct.zip (25K).


POSTSCRIPT AND PDF

Also see AntiWord, below.

Here is a list of links, to PostScript and PDF documents by Adobe and others.


a2ps (Any to PostScript) — Generates PostScript from ASCII, dvi, and other file formats.

* * * * *

[added 2005-12-08]

a2ps builds PostScript documents by adding formatting codes to a source text. Output may be sent directly to a PostScript printer, or to a file which can be viewed (and more) with Ghostscript. This is a large, complex program, but setting it up and learning it will pay off – if you need PostScript docs, you will be very happy with a2ps. Ported from Unix, 32-bit DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI or other).

From the docs:
The format used is nice and compact: normally two pages on each physical page, borders surrounding pages, headers with useful information (page number, printing date, file name or supplied header), line numbering, pretty-printing, symbol substitution etc. This is very useful for making archive listings of programs or just to check your code in the bus. Actually a2ps is kind of bootstrapped: its sources are frequently printed with a2ps :).

While at the origin its name was derived from "ASCII to PostScript", today we like to think of it as "Any to PostScript". Indeed, a2ps supports delegations, i.e., you can safely use a2ps to print DVI, PostScript, LaTeX, JPEG etc., even compressed.

A short list of features of a2ps might look like this:

Usage:
a2ps [OPTION]... [FILE]...

Convert FILE(s) or standard input to PostScript.

Mandatory arguments to long options are mandatory for short options too.
Long options marked with * require a yes/no argument, corresponding
short options stand for 'yes'.

Tasks:
  --version        display version
  --help           display this help
  --guess          report guessed types of FILES
  --which          report the full path of library files named FILES
  --glob           report the full path of library files matching FILES
  --list=defaults  display default settings and parameters
  --list=TOPIC     detailed list on TOPIC (delegations, encodings, features,
                   variables, media, ppd, printers, prologues, style-sheets,
                   user-options)

After having performed the task, exit successfully.  Detailed lists may
provide additional help on specific features.

Global:
  -q, --quiet, --silent      be really quiet
  -v, --verbose[=LEVEL]      set verbosity on, or to LEVEL
  -=, --user-option=OPTION   use the user defined shortcut OPTION
      --debug                enable debugging features
  -D, --define=KEY[:VALUE]   unset variable KEY or set to VALUE
  -M, --medium=NAME      use output medium NAME
  -r, --landscape        print in landscape mode
  -R, --portrait         print in portrait mode
      --columns=NUM      number of columns per sheet
      --rows=NUM         number of rows per sheet
      --major=DIRECTION  first fill (DIRECTION=) rows, or columns
  -1, -2, ..., -9        predefined font sizes and layouts for 1.. 9 virtuals
  -A, --file-align=MODE  align separate files according to MODE (fill, rank
                         page, sheet, or a number)
  -j, --borders*         print borders around columns
      --margin[=NUM]     define an interior margin of size NUM

The options -1.. -9 affect several primitive parameters to set up
predefined layouts with 80 columns. Therefore the order matters: '-R
-f40 -2' is equivalent to '-2'. To modify the layout, use '-2Rf40', or
compose primitive options ('--columns', '--font-size' etc.).

      --line-numbers=NUM     precede each NUM lines with its line number
  -C                         alias for --line-numbers=5
  -f, --font-size=SIZE       use font SIZE (float) for the body text
  -L, --lines-per-page=NUM   scale the font to print NUM lines per virtual
  -l, --chars-per-line=NUM   scale the font to print NUM columns per virtual
  -m, --catman               process FILE as a man page (same as -L66)
  -T, --tabsize=NUM          set tabulator size to NUM
  --non-printable-format=FMT specify how non-printable chars are printed

Headings:
  -B, --no-header        no page headers at all
  -b, --header[=TEXT]    set page header
  -u, --underlay[=TEXT]  print TEXT under every page
  --center-title[=TEXT]  set page title to TITLE
  --left-title[=TEXT]    set left and right page title to TEXT
  --right-title[=TEXT]
  --left-footer[=TEXT]   set sheet footers to TEXT
  --footer[=TEXT]
  --right-footer[=TEXT]

The TEXTs may use special escapes.

  -a, --pages[=RANGE]        select the pages to print
  -c, --truncate-lines*      cut long lines
  -i, --interpret*           interpret tab, bs and ff chars
      --end-of-line=TYPE     specify the eol char (TYPE: r, n, nr, rn, any)
  -X, --encoding=NAME        use input encoding NAME
  -t, --title=NAME           set the name of the job
      --stdin=NAME           set the name of the input file stdin
      --print-anyway*        force binary printing
  -Z, --delegate*            delegate files to another application
      --toc[=TEXT]           generate a table of content

When delegations are enabled, a2ps may use other applications to handle
the processing of files that should not be printed as raw information,
e.g., HTML PostScript, PDF etc.

  -E, --pretty-print[=LANG]  enable pretty-printing (set style to LANG)
  --highlight-level=LEVEL    set pretty printing highlight LEVEL
                             LEVEL can be none, normal or heavy
  -g                         alias for --highlight-level=heavy
  --strip-level=NUM          level of comments stripping
  -o, --output=FILE          leave output to file FILE.  If FILE is '-',
                             leave output to stdout.
  --version-control=WORD     override the usual version control
  --suffix=SUFFIX            override the usual backup suffix
  -P, --printer=NAME         send output to printer NAME
  -d                         send output to the default printer
      --prologue=FILE        include FILE.pro as PostScript prologue
      --ppd[=KEY]            automatic PPD selection or set to KEY
  -n, --copies=NUM           print NUM copies of each page
  -s, --sides=MODE           set the duplex MODE ('1' or 'simplex',
                             '2' or 'duplex', 'tumble')
  -S, --setpagedevice=K[:V]  pass a page device definition to output
      --statusdict=K[:[:]V]  pass a statusdict definition to the output
  -k, --page-prefeed         enable page prefeed
  -K, --no-page-prefeed      disable page prefeed

Authors: Miguel Santana & Akim Demaille, France (2001)

2001-01-16: v4.13. Free under GNU General Public License.

Downloads
Program
a2ps413b.zip
(1.1MB)
Docs
a2ps413d.zip
(2.0MB)
Source
a2ps413s.zip
(2.9MB)

Get more info and versions for other OSes at La GNU a2ps home page. Note that the page's DOS version info and download link are old – use our link.

Adobe's PostScript pages.


PSUtils (PostScript utilities) — Manipulate PostScript documents.

* * * * *

[added 2005-12-08]

This is a collection of programs and scripts that adjust formatting of PostScript documents or prepare other formats for further processing. Some of the tasks can be performed in a2ps, but for small jobs, these are faster. Also, the PSUtils will do a few things that a2ps does not at all. Final output can be viewed in Ghostscript or sent to a PostScript printer. EXEs are 32-bit DJGPP compilations, require 80386+ and a DOS Protected Mode Interface (CWSDPMI or other). Scripts require installation of their languages – click the links in the script names to see what they are.

Program executables
psbook
Rearranges pages into signatures
psselect
Selects pages and page ranges
pstops
Performs general page rearrangement and selection
psnup
Put multiple pages per physical sheet of paper
psresize
Alter document paper size
epsffit
Fits an EPSF file to a given bounding box

Scripts
getafm (sh)
Outputs PostScript to retrieve AFM file from printer
showchar (sh)
Outputs PostScript to draw a character with metric info
fixdlsrps (perl)
Filter to fix DviLaser/PS output so that PSUtils works
fixfmps (perl)
Filter to fix framemaker documents so that psselect etc. work
fixmacps (perl)
Filter to fix Macintosh documents with saner version of md
fixpsditps (perl)
Filter to fix Transcript psdit documents to work with PSUtils
fixpspps (perl)
Filter to fix PSPrint PostScript so that psselect etc. work
fixscribeps (perl)
Filter to fix Scribe PostScript so that psselect etc. work
fixtpps (perl)
Filter to fix Troff Tpscript documents
fixwfwps (perl)
Filter to fix Word for Windows documents for PSUtils
fixwpps (perl)
Filter to fix WordPerfect documents for PSUtils
fixwwps (perl)
Filter to fix Windows Write documents for PSUtils
extractres (perl)
Filter to extract resources from PostScript files
includeres (perl)
Filter to include resources into PostScript files
psmerge (perl)
Hack script to merge multiple PostScript files

Author: Angus J. C. Duggan, Scotland (1995).

2000-12-05: v1.17 for DOS.

Downloads



Binaries
psut117b.zip
(230K)
Docs
psut117d.zip
(80K)
Source
psut117s.zip
(103K)

Get detailed info on all components at the PSUtils page


Ghostscript (AFPL Ghostscript) — Views and prints PostScript and PDF files.

* * * * *

[added 2005-12-08]

Ghostscript reads PostScript and PDF files, processes them, and sends formatted output to the screen, to a file, or to a non-PostScript printer. 32-bit program, requires DOS extender (4GW, in binaries package). Distributed under Aladdin Free Public License (AFPL).

From the docs – Ghostscript works by providing:

Usage: gs [switches] [file1.ps file2.ps ...]
Most frequently used switches: (you can use # in place of =)
 -dNOPAUSE           no pause after page
 -q                  'quiet', fewer messages
 -g<width>x<height>  page size in pixels
 -r<res>             pixels/inch resolution
 -sDEVICE=<devname>  select device
 -dBATCH  exit after last file
 -sOutputFile=<file> select output file: - for stdout,
                     |command for pipe, embed %d or %ld for page #
Input formats: PostScript PostScriptLevel1 PostScriptLevel2 PDF
Available devices:
   vga ega svga16 atiw tseng tvga deskjet djet500 laserjet ljetplus ljet2p
   ljet3 ljet4 cdeskjet cdjcolor cdjmono cdj550 pj pjxl pjxl300 uniprint
   epson eps9high ibmpro bj10e bj200 bjc600 bjc800 pcxmono pcxgray pcx16
   pcx256 pcx24b pcxcmyk tiffcrle tiffg3 tiffg32d tiffg4 tifflzw tiffpack
   bmpmono bmp16 bmp256 bmp16m tiff12nc tiff24nc psmono psgray bit bitrgb
   bitcmyk jpeg jpeggray pdfwrite nullpage
Search path:
   . ; . ; c:/gs ; c:/gs/fonts
For more information, see use.txt.

Originally published by Aladdin Enterprises. Now maintained by artofcode LLC and Artifex Software.

1997-11-23: v5.10, last for DOS.

Download all
gs510dos.zip
(890K)
Binaries
gs510ini.zip
(805K)
Initialization files
gs510fn1.zip
(1.2MB)
Fonts 1
gs510fn2.zip
(1.1MB)
Fonts 2

Versions for Windows and other OSes, as well as support utils, docs, etc., are available. Go to the Ghostscript, Ghostview and GSview Home Page for info, and to one of the Mirror Sites for Ghostscript for downloads.

Binaries and source code for current, some older, and developer versions are also available at the File List page at SourceForge.


PSX — Converts PostScript documents to plain text.

*

[updated 2005-03-11]

PSX is a small (16K) and simple command line PostScript document-to-text converter that I found somewhere on a BBS. It does a very inconsistent job of translation (sometimes good, sometimes very poor) — but if you just want to browse the contents of a PostScript text file you downloaded off the Net, this program may suffice as a disk-saving alternative to Ghostscript. The main eye-sores resulting from conversion are loss of paragraph formatting and some split words. PSX is donationware. I suspect you won't find the latest version anywhere on the Net except here.

syntax: PSX [PostScriptfile] [textfile] [/option]

Both the input (PostScript) and output (text) file names may be
optionally entered at the command line. If no text file name is
specified PSX creates an ASCII file using the same name as the
PostScript file, but with ".TXT" as the DOS filename extension.
If no PostScript filename is specified, PSX will ask for one.
options:  /HELP (or /?) displays this text
          /WIDTH=n (n is a number between 40 and 132 controlling output)

Author: Frank Brown (1992-95).

1995-08-11: v1.02e.

Download psx102e.zip (15K).


Text2PDF — Converts text files to PDF.

unrated

[added 1998-12-08, updated 2005-03-02]

Text2PDF is a small (20K), versatile utility that converts a plain ASCII file to 7-bit clean Adobe PDF file (version 1.1) from any input file. It reads from standard input or a named file, and writes the PDF file to standard output.

Limitations: You cannot produce hypertext links – either to bookmarks, within the file, or to external content. You cannot add styles to headings or body elements, nor does the program reformat bullets and numbered lists. Text is formatted as is. You will probably have to tweak your text files to ensure that the word wrapping is correct.

text2pdf [options] [filename]

  Options:

  -h		show this message
  -f<font>	use PostScript <font> (must be in standard 14, default: Courier)
  -I		use ISOLatin1Encoding
  -s<size>	use font at given pointsize (default 10)
  -v<dist>	use given line spacing (default 12 points)
  -l<lines>	lines per page (default 60, determined automatically
                if unspecified)
  -c<chars>	maximum characters per line (default 80)
  -t<spaces>	spaces per tab character (default 8)
  -F		ignore formfeed characters (^L)
  -A4		use A4 paper (default Letter)
  -A3		use A3 paper (default Letter)
  -x<width>	independent paper width in points
  -y<height>	independent paper height in points
  -2		format in 2 columns
  -L		landscape mode

Author: Phil Smith (1996). Suggestion & notes by Scott Nesbitt.

1996-10-11: v1.1.

Download text2pdf.zip (12K).

Go to the text2pdf page, and to the PDF Corner, for versions for Windows and Unixes, and other related materials.


Xpdf — Toolkit for extracting text / information / images from Adobe PDF files.

unrated

[added 2000-02-09, updated 2005-12-08]

A suite of command line tools for extracting data from Adobe PDF files.

Pdftotext
Converts PDF files to plain text. If text file is '-', the text is sent to stdout.
Pdfinfo
Prints the contents of the 'Info' dictionary (plus some other useful information).
Pdftops
Reads a PDF file and writes a printable PostScript file. If PS file is '-', the content is sent to stdout.
Pdfimages
Saves images from a PDF file as Portable Pixmap (PPM), Portable Bitmap (PBM), or JPEG files.

Remarks: Programs are 32-bit DJGPP compilations, require 80386+, DOS Protected Mode Interface (CWSDPMI or other), and FPU (80387 or 80486+). File names and zip archive directory names do not all conform to DOS 8+3 conventions. Possible special requirements: gzip in path (latest versions may not need it). These programs may not be well-suited to low resource hardware. Also available for Win32, Linux, OS/2, and other OSes. Source available. Free under GNU General Public License.

Author: Derek B. Noonburg / Foo Labs (2005). Added on tip by Bob Williams (Surv-PC forum).

2005-08-17: v3.01.

Download xpdf-3.01-dos6.zip (1.6MB).

Xpdf pages at Foo Labs.

Get latest version info and files from the Download page.


Acrobat Reader — Adobe's PDF file reader.

* * *

[updated 2005-12-08]

Why use an old DOS version of Acrobat? Good question. You probably shouldn't. It seems to do fine with old PDF files, or simple ones such as tax forms but it does not support many of the latest enhancements introduced over the past couple years. Hint: The "bitmap" printing option is useful if you lack the fonts required by a given document.

Requirements: DOS 3.30+, 80386 (80486 better), 2MB RAM (4MB better), 5MB disk space, VGA, and maybe some disk "acrobatics" if you're short on disk space – the 2.5MB zip below contains a 2.5MB self-extracting EXE, which must be run to unpack the install files (2.5MB total), and then you must run the installer EXE.

Author: Adobe Systems (1993).

Download Acrodos.zip (2.5MB).
Or get these two files and unzip them to diskettes:
AdobeAcrobatDos1.zip and AdobeAcrobatDos2.zip (1.4MB each).

Also see Ghostscript.

Adobe's Acrobat pages.


CONVERT UNIX < > DOS FORMATS

Advanced, broad function text processing programs like LM, SED, or AWK can perform most of the specialized tasks described in this section, but those listed below may be better suited to the casual user or may include special options not available in other tools.

unrated

If you're looking for a converter that also handles MAC text, see NLX in the Penta Text Tools, or REMOVE, or FIXTEXT.


RUM — Converts a file between UNIX and DOS text formats.

[added 1998-04-04]

Simple, reliable, and user-friendly. Batch or interactive mode operation possible. No wildcard support. Can ouput to same or different filename.

Author: Jack Lee (1993). Suggested by Marianna Van Erp.

Download rum10.zip (10K).


FLIP — Converts file(s) between UNIX and DOS text formats.

[added 1998-04-04]

FLIP accepts wildcards and offers some specialized options (e.g., convert binaries, no time stamp modification of output files). Outputs to same filename.

Author: Rahul Dhesi (1989). Originally featured on Yves Bellefeuille's freeware list.

Download flip1exe.zip (15K).


ux2dos & dos2ux — Convert Unix format text < > DOS format (mixed formats handled).

[added 2004-10-03]

These are DOS counterparts of Unix utilities. From the docs:
dos2ux replaces carriage-return/newline pairs by newlines in DOS format text files to conform to UNIX requirements. Existing isolated newlines are left intact, so that no changes are made to a file which is already in UNIX format.
ux2dos adds carriage returns to isolated newlines (linefeeds) in UNIX format text files to conform to DOS requirements. Existing carriage-return/newline pairs are left intact, so that no changes are made to a file which is already in DOS format.
Access and modification time stamps of the files are preserved.

Author: Nelson H. F. Beebe (1989).

Download ux2dos.zip (27K).


CONVERT OTHER FILE FORMATS

Notes: HTML converters are listed on the HTML page. For a good beginner's intro to the desktop publishing package TeX, see Scott Nesbitt's article TeX: The DTP Alternative.


AntiWord — Displays MS Word files, and converts to plain text, PostScript, PDF.

unrated

[added 2001-10-14, updated 2005-12-08]

AntiWord displays documents created by Microsoft Word v2, or v6 and later. It also converts from Word format to plain text, PostScript (see Ghostscript) or PDF. "A Word document can now be saved as 'formatted' text. That means with things like *bold* to show bold text, /italics/ to show italics and _undeline_ to show underlined text are added to the plain text". Use as a filter. 16- and 32-bit DOS versions available – 32-bit version is a DJGPP build, requires 80386+ and a DOS Protected Mode Interface (CWSDPMI, or other). Also available for RISC OS, Linux, Unix (with sources), BeOS, OS/2, Mac OS/X, Amiga. Freeware under GNU General Public License.

Usage: antiword [switches] wordfile1 [wordfile2 ...]

Switches: [-f|-t|-a papersize|-p papersize|-x dtd]
          [-m mapping][-w #][-i #][-Ls]

          -f formatted text output
          -t text output (default)
          -a <paper size name> Adobe PDF output
          -p <paper size name> PostScript output paper size like:
             a4, letter or legal
          -x <dtd> XML output like: db (DocBook)
          -m <mapping> character mapping file
          -w <width> in characters of text output
          -i <level> image level (PostScript only)
          -L use landscape mode (PostScript only)
          -s Show hidden (by Word) text

Limitations: "Many images are not shown yet. Some of the images that are shown, are shown in the wrong place. PostScript output is only available in ISO 8859-1 and ISO 8859-2.."

Notes: The DOS version expects its mapping files in %HOME%\ANTIWORD if HOME is set, or in C:\ANTIWORD\ if HOME is not set (mapping files are distributed in RESOUR~1\ directory of zip – place them in C:\ANTIWORD after unzipping).

Author: Adri van Os, Netherlands (2005). Suggested by Robert Bull.

2005-10-21: v0.37.

Downloads
16-bit
antiword.zip
(149K)
32-bit
antiword.zip
(192K)

Antiword page "...best viewed with your monitor switched on."


catdoc — Converts / extracts text from Word, Excel or PowerPoint files.

unrated

[added 1999-08-06, updated 2005-12-08]

This is a set of three utils:
catdoc
Converts MS Word files to plain text or other formats
xls2csv
Converts Excel spreadsheets to comma-separated value (CSV) text
catppt
Extracts readable text from PowerPoint files
From the docs:
catdoc behaves much like [Unix] cat but it reads MS-Word file and produces human-readable text on standard output. Optionally it can use LaTeX escape sequences for characters which have special meaning for LaTeX. It also makes some effort to recognize MS-Word tables...additional output formats, such as HTML can be easily defined...uses internal Unicode representation of text, so it is able to convert texts when charset in source document doesn't match charset on target system.
xls2csv reads MS-Excel spreadsheet and dumps its content as comma-separated values to stdout. Numbers are printed without delimiters, strings are enclosed in the double quotes. Double-quotes inside string are doubled.
catppt reads MS-PowerPoint presentations and dumps content to stdout.

Notes: 16-bit DOS operation only – no support for 32-bit Windows Long File Names. Docs in plain text, Unix  man  and HTML formats. Source included. Source-only distribution is also available (Unixes and DOS). Released under GNU General Public License.

Author: Victor Wagner, Russia (2005).

2005-05-01: v0.94.

Download catdoc-0.94.zip (346K). Unzip with "create directories" option.

Get more info, history & updates at the catdoc & xls2csv page

Also see the author's DOS utilities page.


HelpDeco — Converts Win 3.x/95 HLP files to RTF format.

* * * *

This DOS program is a Windows 3.x/95 *.HLP file decompiler. It's useful to the non-programmer because it has an option  /r  that converts HLP files to RTF format, which can be further converted to plain text (by word processors) or to HTML (e.g., see Martha). Also included: SPLITMRB and ZAPRES, image processors. A large program (237K), but fast. Documentation bilingual (German/English); may be difficult to follow.

From the docs: HelpDeco...
will recreate all source files (RTF, HPJ, MVP, BMP, WMF, SHG, MRB,...) from all Windows 3.x/'95 .HLP help files and most .MVB multi media viewer titles. Load the resulting RTF file into WinWord to view and print, or modify the topics of the help file and rebuild it using the appropriate help compiler (HC30, HC31, HCP, HCW, HCRTF, WMVC, MMVC, MVC, not included, available at Microsoft). The rebuilt helpfile will not be identical, but should behave like the original, even in respect to inter-HLP-file links. All text, formatting, hypertext links, pictures, macros etc. will be conserved...It will run as a 16 bit application from MS-DOS command line and as a 32 bit application from Windows 95/NT command line.

Author: Manfred Winterhoff, Germany (1997).

1997-01-28: v2.1.

Downloads
helpdeco.zip
(139K)
EXEs
helpdc21.zip
(218K)
EXEs, source, HLP file format description

Get more info and external support utils at the HelpDeco page.


WP2LaTeX — Converts WordPerfect 3.x-8.x, HTML, RTF & other document files to LaTeX.

unrated

[updated 2005-12-08]

From the docs:
WP2LaTeX is a program, which is designed to translate WordPerfect documents into LaTeX 2.09 and LaTeX 2.0e. The current version is able to cope with Macintosh WordPerfect 3.x, WP 4.x; WP 5.x and WP 6.x documents. (WP 7.x & 8.x have same binary file format as WP6 so no additional conversion module is necessary.) WP2LaTeX is NOT a text processor and converted documents will require a LaTeX document processor.
It is possible to convert a lot of features in the current version for example: Headers, Tables, Equations, Centered+Right+Left text, a lot of extended characters (greek, math, cyrilic, accented) and of course a normal text.
These are WP3.x, WP4.x, WP5.x, WP6.x and even several non WP like abiword, Accent, MTEFF, OLE Stream, HTML, RTF, T602, UNICODE and WORD.

Other: Accepts Unicode input. TIPA font support. Program messages in choice of English / Czech / German. Download package contains executables for DOS16, DOS32 (DJGPP), OS/2, Linux, Win32. Source code also included.

Maintainer: Jaroslav Fojtik, Czech Republic. Suggested by Scott Nesbitt.

2005-02-06: v3.23.

Download wp2latex-3.23.zip (3.1MB).

Online User's Guide.

Visit the WP2LaTeX Homepage for latest version and support utilities.


Xray — extract plain text present in binary files.

* * *

Xray extracts plain text from binary files. One use: can show text contained in executables, dll's, etc. It can also be used as a crude means of getting plain text from any word processor file, although formatting is lost in the process.

Download xray105.zip (7.5K).

Also see the similar program ReadText, rt101.zip (6K).


Go to Top | Front Page ]


©1994-2004, Richard L. Green.
This Edition ©2004-2005, Richard L. Green and Short.Stop.