Qbasicnews.com
November 19, 2019, 02:47:00 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
News: Back to Qbasicnews.com | QB Online Help | FAQ | Chat | All Basic Code | QB Knowledge Base
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Ideas for a text "unformatter"  (Read 3309 times)
Antoni Gual
Na_th_an
*****
Posts: 1434



WWW
« on: May 14, 2004, 10:13:56 AM »

I'm back programming in QB...

I'm doing a text "unformatter"

Present wishlist:
-End of line conversion  from UNIX and mac to  DOS
-ANSI to ASCII or reverse.
-Smart delete of "hardened" soft-end-of-lines, trying to preserve
tables  and lists.
-Text to html and reverse
-Smart deletting of page headers, footers and page number
-Smart conversion of the spaces used by wordprocesor newbies to
align tables.

Target: Allow preformatted plain text or html to be imported in a wordprocessor or a  PDB converter (for Pocket PC)

Do you want to suggest any additional feature?
Logged

Antoni
na_th_an
*/-\*
*****
Posts: 8244



WWW
« Reply #1 on: May 14, 2004, 02:32:53 PM »

Word wrapper and justifier would be neat.
Logged

SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Antoni Gual
Na_th_an
*****
Posts: 1434



WWW
« Reply #2 on: May 16, 2004, 09:54:14 PM »

Well, the idea is just the opposite... To return a text as free of format as possible, so a wordprocessor or a browser can adapt it (and do its own word wrapping and justification) to any page width you want.
It would be a pre-wordprocessor tool, to avoid having to remove manually line ends..
Logged

Antoni
na_th_an
*/-\*
*****
Posts: 8244



WWW
« Reply #3 on: May 16, 2004, 10:59:35 PM »

hmmm now I see what you want to do. Great then, I've been coding little proggies in QB for doing such tasks for years, it would be great to have a full featured application.
Logged

SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Antoni Gual
Na_th_an
*****
Posts: 1434



WWW
« Reply #4 on: May 16, 2004, 11:11:18 PM »

Well, here is a first draft (only end of line removing):
http://www.geocities.com/antonigual/qbsource/unform2.zip
Geocities: Right-click and save as...
Logged

Antoni
Moneo
Na_th_an
*****
Posts: 1971


« Reply #5 on: May 17, 2004, 11:50:51 PM »

Antoni,
Hola, ┐como has estado?

If you're interested, I have a tested "C" program to convert UNIX text to DOS text, handling the different end-of-line characters. I can mail you the code.
*****
Logged
adosorken
*/-\*
*****
Posts: 3655



WWW
« Reply #6 on: May 18, 2004, 01:39:10 AM »

Aye carajo, muy bien se˝or. Cheesy
Logged

I'd knock on wood, but my desk is particle board.
Antoni Gual
Na_th_an
*****
Posts: 1434



WWW
« Reply #7 on: May 18, 2004, 04:08:19 AM »

Buenos dias a todos!

I have been busy learning windows programming in C, then I have become too lazy to program anything. Not a line of code since the nine-liner contest at QBNZ. That was february?. In fact i have been trading a lot of music and books p2p.

Reading the books I got in my PocketPC is what gave me the idea of the program. Most of them are in Acrobat format and Acrobat Reader for PPC is a pain in the ass, so I export them as text and I hope to use my program to free it from unwanted carry returns.

I see most of you have hacked proggies to deal with that kind of things. I want to gather all that in a single program. Any idea is welcome!

Moneo:Thanks. I already solved that part of the problem in the draft version.
Logged

Antoni
Z!re
*/-\*
*****
Posts: 4599


« Reply #8 on: May 18, 2004, 05:05:06 AM »

So, basically, your program will transform formatted text into pure text?

"this
is
an ex
ample!"

becomes:
"this is an example"


Amd I right?
Logged
Antoni Gual
Na_th_an
*****
Posts: 1434



WWW
« Reply #9 on: May 18, 2004, 05:35:59 AM »

Yes, this is the idea. To have plain unformatted text to import to a wordprocessor.
But to make it more complicated I want to preserve tables and program listings formats...

At he same time it would convert Unix and Mac end of lines to DOS, and do character set conversion. For example ASCII to Windows, or Windows to HTML.

A thing I want to do and I don't know how is to remove page headers, footers and page numbers. Any idea?

An example. The [after] is the output of what i have so far (except for the CODE tag):
Quote

[before]

     Double precision MBF numbers use only eight bits for an exponent
rather than eleven, trading a reduced absolute range for increased
resolution.  That is, there are fewer exponent bits than the IEEE method
uses, which means that extremely large and extremely small numbers cannot
be represented.  However, the additional mantissa bits offer more absolute
digits of precision.


The IEEE format:

┌────────┬────────┬────────┬────────┐
│SEEEEEEE│EMMMMMMM│MMMMMMMM│MMMMMMMM│
└────────┴────────┴────────┴────────┘


[after]

     Double precision MBF numbers use only eight bits for an exponent rather than eleven, trading a reduced absolute range for increased resolution.  That is, there are fewer exponent bits than the IEEE methoduses, which means that extremely large and extremely small numbers cannotbe represented.  However, the additional mantissa bits offer more absolute digits of precision.


The IEEE format:
Code:

++++++++++++++++++++++++++++++++++++|
│SEEEEEEE│EMMMMMMM│MMMMMMMM│MMMMMMMM│
+++++++++++++++++++++++++++++++++++++

Logged

Antoni
na_th_an
*/-\*
*****
Posts: 8244



WWW
« Reply #10 on: May 18, 2004, 09:48:33 AM »

For page footers et al, all I can figure out is the use of a regular expressions evaluator and a text replacer. You create a regular expression that fits the format of the page numbers and then use find and replace to eliminate the text which is equal.
Logged

SCUMM (the band) on Myspace!
ComputerEmuzone Games Studio
underBASIC, homegrown musicians
[img]http://www.ojodepez-fanzine.net/almacen/yoghourtslover.png[/i
Antoni Gual
Na_th_an
*****
Posts: 1434



WWW
« Reply #11 on: May 18, 2004, 10:31:40 AM »

Yes I considered  regular expressions, the problem is implementing them in QB...
Maybe the solution would be running first SED and then my progam, as SED does'nt provide an easy solution to selective end-of-line removal.
Logged

Antoni
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!