Qbasicnews.com

QBasic => QB Discussion & Programming Help => Topic started by: alajandrolieber on September 17, 2005, 10:32:17 AM



Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 17, 2005, 10:32:17 AM
I have to open a 140Mby text file with 48 and 49 bytes records and rewrite a new one, with records that contain "AC" at lengh 44

I use:

OPEN "D:\FILE.TXT" FOR INPUT AS #1
OPEN "C:\CUIT\NEW.TXT FOR OUTPUT AS #2

FOR RECORD& = 1 TO  1000        REM  THERE ARE NEARLIY 3M RECORDS
    LINE INPUT #1 RECORD$
    IF MID$(RECORD$,44,2)="AC" THEN WRITE #2 RECORD$
NEXT RECORD&
   

All I get is the following error:

Out of string space.

But if I replace D:\FILE.TXT  (nearly 140Mby) with
                          C:\CUIT\PARTFILE.TXT  (100Kby)

no errors occur.

Could it be that OPEN tests the file size before reading the first record ?
OPEN as any limitation in the size of a file that can open ?


Alejandro Lieber
Rosario Argentina


Title: "Out of string space" opening a big secuencial fil
Post by: MystikShadows on September 17, 2005, 11:53:11 AM
Hi alajandrolieber

The file limit (which is a DOS limit not a QB limit per se) is 2Gb.

I'm not sure what the file looks like (the big one that is) but one of the things that can cause such an error is if perhaps a line is bigger than the string variable allows.

Since you have a LINE INPUT # statement it would seem that one of those lines is indeed longer than the string type allows, if that's the case, open the file in a windows text editor and see if you can see where the problem is.

It must be the cause.


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 17, 2005, 04:19:27 PM
Che Alejandro,

Use PRINT # instead of WRITE #.
The use of WRITE # is strongly NOT recommended except for very special cases when you want quotes around string data. Try this, although it may not solve the "out of string space" problem.

You should not use the FOR loop of 1 to 1000. If there are less than 1000 records on the file, you will get an error. If there are more that 1000, you will not process the extra records.
Use the following to process all the records:
Code:

DO WHILE NOT EOF (1)
      LINE INPUT #1 RECORD$
      IF MID$(RECORD$,44,2)="AC" THEN PRINT #2 RECORD$
LOOP

What is the format of the input file D:\FILE.TXT ?
It should be text only.
Are all the records 48 or 49 bytes long?
How was this file created?
Are there other programs that can read it successfully?

Like MystikShadows says, maybe this input file has been corrupted somehow. If you can, view it with a text editor like he says. If the file is really 140MB, few editors will handle it.

Let us know what happened.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 17, 2005, 07:48:34 PM
Thanks all.

The input file is pure text, aprox 3.200.000 records, cannot know because the records are 48 and 49 bytes long.

This file is the complete list of all Argentine tax contributors, but should have only 49 bytes records, so it should be easy to open with randon file access.

Here in Argentina we have V.A.T. tax, so we should know before making an invoice, if the buyer is included in that list.

Up to now I have used XTREE Gold's View  to find any text in that file, but  reads it secuenselly so it can take up to 2 minutes to find a record.

I think QBasic can read aprox. 2M records in a randon file.

 I used FOR T=1 to 1000 just to begin programing.
It just cannot read the first record.

I repeat: a part of that huge file can be read and writen to a new one if I use only a small part of it.

Alejandro Lieber


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 17, 2005, 10:44:01 PM
Che,
You said "I cannot read the first record".
This makes me suspect that this "text" file does not have records that end with a Carriage Return and Line Feed (CRLF) at the end of each record. Maybe this text file was created by a "C" program or on a Unix machine, which only put a Line Feed on the end of each record.

When you do a LINE INPUT, it expects to find a CRLF at the end of each record. It keeps reading until it finds it, or it  blows up with out of string space error.

You will probably need a Hex editor to see what's on the end of each record. See if XTree has a hex option. Another quick way is to have XTree search the file for a CRLF. If it doesn't find any, than we know it doesn't have any.

Let me know if this is the problem, so we can figure out how to fix it. For example: if XTree has a search and replace option, you could search for a Line Feed and replace with Carriage Return and Line Feed.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: MystikShadows on September 18, 2005, 06:11:54 AM
Moneo is right :-).

Some freely available text editors offer the option to save for linux or save for windows (which would add that CRLF at the end of every line.


Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 18, 2005, 09:20:31 AM
Both files, the aprox 140Mby and the other,  part of the first one, size aprox 100Kby
have character 0A at the end of each 38/39 bytes record.

The program works perfectly with the small one, but gives "Out of string space" with the big one.

The problem is not the record delimiter.

I also did a:

PRINT FRE("")  and got 29856

Alejandro Lieber
Rosario Argentina


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 18, 2005, 01:14:28 PM
Quote from: "alajandrolieber"
Both files, the aprox 140Mby and the other,  part of the first one, size aprox 100Kby
have character 0A at the end of each 38/39 bytes record.

The program works perfectly with the small one, but gives "Out of string space" with the big one.

The problem is not the record delimiter....

Alejandro, Te estoy tratando de ayudar, pero no escuchas bien lo que te digo. Tengo experiencia en estos asuntos. Dime si quieres que te siga apoyando.

You said ".... have character 0A at the end of each ....". Of course. A record ending in only a Line Feed (hex 0A) will end with a 0A just like a record ending with a Carriage Return and Line Feed (hex 0D0A) also ends with a 0A.
Go back and look at the record delimeters again and see what is the character JUST BEFORE the ending 0A. Check both the long and short files. Let me know the results of this.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 18, 2005, 09:01:31 PM
Moneo:  claro que te escucho y mucho te lo agradezco.
He programado años pero solo con archivos RANDOM, pero no SECUENCIALES.

The character before the end of record 0A is any text.
It is the record data, characters codes between 33 and 127.

If someone can try to open any BIG text file with OPEN and report the results.

Alejandro Lieber
Rosario  Argentina


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 18, 2005, 09:50:02 PM
Quote from: "alajandrolieber"
Moneo:  claro que te escucho y mucho te lo agradezco.
He programado años pero solo con archivos RANDOM, pero no SECUENCIALES.

The character before the end of record 0A is any text.
It is the record data, characters codes between 33 and 127.
....

Bien. Mi experiencia en diferentes dialectos de Basic desde 1969 ha sido en acceso secuencial, nunca random.

If the character before the end of record 0A is NOT a 0D, then the records of this file cannot  be read with LINE INPUT. I suspect that RANDOM won't work either.

I assume that this is the case for the large file. I suspect that the records of the small file, which can be processed, end in 0D0A. How was this small file created?

Anyway, I see the following options for you:
1) Obtain another version of the large file with records delimited by Carriage Return and Line Feed (0D0A).

2) If XTree has the option, search or 0A and replace with 0D0A, generating a new large file.

3) I had the same problem 10 years ago, and wrote a utility program to replace the record delimiters. I would need to find this program and send it to you.

So, which of the above 3 options is the best for you?
*****


Title: "Out of string space" opening a big secuencial fil
Post by: DrV on September 18, 2005, 10:03:53 PM
You could also avoid using LINE INPUT and just read a buffer of a certain number of bytes, searching for 0A and splitting at those points.


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 18, 2005, 10:41:07 PM
Quote from: "DrV"
You could also avoid using LINE INPUT and just read a buffer of a certain number of bytes, searching for 0A and splitting at those points.

Yes, this could be done, but then the I/O becomes the major part of this program, with figuring out where records begin and end, whether a record spans the buffer size, and detecting end-of-file.

You would be programming around a file input problem instead of fixing the file problem in the first place.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 19, 2005, 04:30:28 PM
Alejandro,

Of the 3 options that I mentioned before, I see only options 1 and 3 working correctly.
Option 1 goes back to the source of the file to get the file with CRLF only. This is the best, which eliminates us from having to manipulate the file.

Option 3 uses my utility to convert the delimiters to CRLF. I found this utility program, which I wrote in C, and the documentation indicates that UNIX text files can contain several combinations of record delimiters, not just Line Feed alone, like:
Carriage Return, Carriage Return, Line Feed
Carriage Return and Line Feed
Carriage Return and Form Feed
Carriage Return only
Line Feed only
Form Feed only

Not knowing what system this file was created under, If they won't do the conversion to CRLF for us (option 1), then I suggest we use my utility.

Option 2 of using XTree becomes a manual problem to run the search and replace for every combination of delimeters.

No matter what method we use to fix the file, your original program should include a test of the length of each record. As per your specifications, the records must be 48 or 49 bytes long. If your program finds a record that is not 48 or 49, then there was some file conversion problem.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: Oz on September 19, 2005, 07:36:52 PM
may i also suggest a BINARY file

just even the bytes per record to whatever (probably 50 would be easiest) by adding spaces or a "NULL" characters

then, you could search for records by using

[syntax="qbasic"]GET #ff, rec_num% * 50, somevar$[/syntax]

that would be the easiest for records

Oz~


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 19, 2005, 08:19:38 PM
Oz,
That's a good idea for later, but first we need to get this "text" file into a standard format with only CRLF as record delimiters.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 19, 2005, 09:36:59 PM
Moneo was right.

OPEN will not detect a new record with only 0A as the last character.

So QBasic could not open  the file because it saw it as a one 140Mby long record file.

I have the program GSAR: General Search and Replace Utility by Tormod Tjaberg.

It seems it can do  what I need:

gsar -ud -o file.txt

will rewrite  file.txt  replacing each 0A with 0D0A (UNIX to DOS)

I will try it next wednesday.

 Alejandro Lieber
Rosario  Argentina


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 19, 2005, 11:20:13 PM
Alejandro, Please read my comments below.

Quote from: "alajandrolieber"
Moneo was right.

OPEN will not detect a new record with only 0A as the last character.
NO. "OPEN" DOES NOT DETECT RECORD DELIMITERS. THE PROBLEM OCCURS ON THE "LINE INPUT".

So QBasic could not open  the file because it saw it as a one 140Mby long record file.
AGAIN, NOT A QBASIC "OPEN" PROBLEM. THE FIRST EXECUTION OF "LINE INPUT" SAW THE HUGE RECORD.

I have the program GSAR: General Search and Replace Utility by Tormod Tjaberg.

It seems it can do  what I need:

gsar -ud -o file.txt

will rewrite  file.txt  replacing each 0A with 0D0A (UNIX to DOS)

I will try it next wednesday.
IN A PREVIOUS POST, I MENTIONED AT LEAST 5 OTHER COMBINATIONS OF RECORD DELIMITERS WHICH "COULD" APPEAR ON THE FILE. UNLESS YOU ARE ABSOLUTELY SURE THAT ALL THE RECORDS ARE DELIMITED ONLY BY ONE LINE FEED, THEN THIS GSAR PROGRAM WILL NOT WORK AND ONLY SCREW UP THE RECORD DELIMITERS EVEN MORE.
IF YOU KNOW EXACTLY HOW MANY RECORDS THE FILE HAS, YOU COULD USE XTREE OR OTHER UTILITY TO COUNT THE NUMBER OF LINE FEEDS THAT ARE ON THE FILE. IF THESE COUNTS COINCIDE EXACTLY, THEN YOU CAN USE THE GSAR UTILITY.

 Alejandro Lieber
Rosario  Argentina


Alejandro,
Where was this file produced? Was it on Unix? Is it a "print file"? Are you familiar wht the program or utility that produced the exact version of this large file?

I've asked you these questions before, but you don't answer. You seem to want to find alternative solutions. I've told you before that I was confronted with this problem before. It is not a simple problem, and the solution is not simple either.

The simplest solution, as I've said before, is to go back to the source of the data file, and request a version with records delimited only by CRLF. Is this option not feasible?

Again, if you run the GSAR program without the assurance that ALL the records are delimited only by one Line Feed, that is, without having counted, then you will be headed for disaster. Believe me.
*****


Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 19, 2005, 11:47:33 PM
>IN A PREVIOUS POST, I MENTIONED AT LEAST 5 OTHER COMBINATIONS
 >OF RECORD DELIMITERS WHICH "COULD" APPEAR ON THE FILE.
>UNLESS YOU ARE ABSOLUTELY SURE THAT ALL THE RECORDS ARE
>DELIMITED ONLY BY ONE LINE FEED, THEN THIS GSAR PROGRAM WILL
>NOT WORK AND ONLY SCREW UP THE RECORD DELIMITERS EVEN
>MORE.


All the records are delimited by one LF



>Where was this file produced? Was it on Unix? Is it a "print file"? Are you
>familiar wht the program or utility that produced the exact version of this large
>file?


The file was produced by the Argentine goverment.
 


>The simplest solution, as I've said before, is to go back to the source of the >data file, and request a version with records delimited only by CRLF. Is this >option not feasible?

I have told them the problem in mixing 38 and 39 bytes records.
Nothing happened in the following release.

>Again, if you run the GSAR program without the assurance that ALL the >records are delimited only by one Line Feed, that is, without having counted, >then you will be headed for disaster. Believe me.

I believe you. But the original file is in a CD, so I can try several  ideas.

Alejandro Lieber


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 20, 2005, 02:57:05 PM
Ok, Alejandro, it looks like you are in control of all the information.

So, are you going to run the CSAR program? Later, when you have a new large file with records delimited by CRLF, don't forget to put the following logic that I suggested into your original progam:

..... your original program should include a test of the length of each record. As per your specifications, the records must be 48 or 49 bytes long. If your program finds a record that is not 48 or 49, then there was some file conversion problem.

You should also add a record count to the program, printing it out at the end, to insure that the number of records processed coincides with your original specifications.

Let us know of the results or any new problems. Good luck!
*****


Title: "Out of string space" opening a big secuencial fil
Post by: alajandrolieber on September 21, 2005, 07:08:55 PM
Moneo:

Tu ayuda ha sido imprecindible.


Muchas gracias, espero poder ayudarte algún dia.

Alejandro Lieber
Rosario  Argentina


Title: "Out of string space" opening a big secuencial fil
Post by: Moneo on September 21, 2005, 09:54:35 PM
Quote from: "alajandrolieber"
Moneo:

Tu ayuda ha sido imprecindible.


Muchas gracias, espero poder ayudarte algún dia.

Alejandro Lieber
Rosario  Argentina

Alejandro,
De acuerdo, gracias, estoy a tus ordenes.

Pero cuéntame, ¿como te fue, qué pasó? Si no quieres ventilar esto aquí, escríbeme al:
moneo@prodigy.net.mx

*****