Moneo
|
 |
« on: July 04, 2003, 01:53:12 PM » |
|
3-WAY MERGE PROGRAM
PURPOSE: Write a program to merge 3 input text files, by key, and generate a merged output text file.
GIVEN: * There are 3 input text files: AAA, BBB, and CCC. * Let's call the output file DDD. * The 3 files are always present and contain at least 1 record. * Each file should be in sequence within itself. * The records are variable length, minimum 4 bytes, maximum 100. * The key is in the first 4 bytes of each record. * The key is strictly numeric, unsigned with leading zeros. * The key can have values from 0000 to 9999. * Records with duplicate keys can occur. * Therefore, the output file could have over 30,000 records.
TIP: In order not to have to check the sequence of each input file, just check the sequence of the output file. If it's out of sequence, then (1) one of the input files was out of sequence, or (2) your program is not merging correctly.
NOTE: This is a problem that has appeared on programming aptitude tests. It is not as simple as it seems. I suggest you do a flowchart first. It wil take about 70-100 lines of code, without comments. Please debug your code before posting.
I will prepare some sample input files on my machine for testing your posted programs.
Good luck! *****
|
|
|
Logged
|
|
|
|
Antoni Gual
|
 |
« Reply #1 on: July 04, 2003, 08:09:45 PM » |
|
1.-The records are variable length? How are they separed? CR+LF?
2.-We can suppose the records in the input files have been sorted previously?
|
|
|
Logged
|
Antoni
|
|
|
Moneo
|
 |
« Reply #2 on: July 04, 2003, 08:47:19 PM » |
|
Yes, Antoni, the records end with CR LF.
You can "suppose" that the records on each input file have been sorted, but you should sequence check them anyway. I ALWAYS check it to make sure. Otherwise, like I said in some other post, my program looks like it failed when the output comes out unsorted. In this particular program, I would only check the sequence of the output. *****
|
|
|
Logged
|
|
|
|
Moneo
|
 |
« Reply #3 on: July 09, 2003, 03:48:16 PM » |
|
Well it looks like nobody jumped in with a solution. You probably figured out that this is a tricky program. Let me give you a few tips that really simplify the logic of this program:
COMPARE CONTROL INDICATOR: ---------------------------------------- I call this NEXTLOW. This is a integer whose bits indicate the results of comparing the keys of the 3 files (AAA,BBB,CCC). Bit 1 of NEXTLOW corresponds to file AAA. Bit 2 of NEXTLOW corresponds to file BBB. Bit 4 of NEXTLOW corresponds to file CCC.
As a result of the compares, you OR the corresponding bit or bits into NEXTLOW based on which file key or keys are low. Then: NEXTLOW is 1 when key of file AAA compared the lowest. NEXTLOW is 2 when key of file BBB compared the lowest. NEXTLOW is 3 when keys of AAA & BBB compared the lowest. NEXTLOW is 4 when key of file CCC compared the lowest. NEXTLOW is 5 when keys of AAA & CCC compared the lowest. NEXTLOW is 6 when keys of BBB & CCC compared the lowest. NEXTLOW is 7 when keys of AAA, BBB & CCC compared equal.
When you're done with the compares and setting up NEXTLOW, which is about 16 lines of code, the rest of the program is trivial because the NEXTLOW indicator tells you what you have to do. Example: If NEXTLOW IS 5, you have to write the records from file AAA and CCC to the output, and then you have to read new records from AAA and CCC, and go back to the compare logic again.
HIGHVALUE CONTROL: --------------------------- Using the concept of "highvalue" save you from setting switches to indicate that particular files have gone to end-of-file. How does this work? When the read for a particular file detects an end-of-file, you stuff a "highvalue" into the key for this file. A highvalue is a string or number that you have determined will be higher that any key of any file. In this sample program, we have a 4 byte key, To be absolutely safe we could set up a highvalue of 5 bytes of hex'FF'. Having done this for all 3 files, then at the end of the compare logic we add one line of code: IF NEXTLOW = 7 AND KEY.OF.AAA = HIGHVALUE THEN GOTO EOJ
What has happened is that the HIGHVALUE has flushed all the files. When the keys of all 3 files are sitting at HIGHVALUE, the program is finished --- there's no more records to process.
The above concepts of NEXTLOW and HIGHVALUE have been used successfully for over 40 years by industry expert. I didn't just pull them out of my hat. *****
|
|
|
Logged
|
|
|
|
Antoni Gual
|
 |
« Reply #4 on: July 11, 2003, 08:56:26 PM » |
|
A little late... Errors I have considered: Input or output files can't be opened Input files out of order Blank lines files with 0 records .... But ,you know, the errors can come in infinite flavours... '3 Way Merge by Antoni Gual 2003
SCREEN 0: CLS
ON ERROR GOTO FILERR OPEN "aaa.txt" FOR INPUT AS #1 IF ERRATA THEN PRINT "FILE AAA.TXT CAN'T BE OPENED": END OPEN "bbb.txt" FOR INPUT AS #2 IF ERRATA THEN PRINT "FILE BBB.TXT CAN'T BE OPENED": END OPEN "ccc.txt" FOR INPUT AS #3 IF ERRATA THEN PRINT "FILE CCC.TXT CAN'T BE OPENED": END OPEN "ddd.txt" FOR OUTPUT AS #4 IF ERRATA THEN PRINT "OUTPUT FILE CAN'T BE OPENED": END ON ERROR GOTO 0
CONST highval = &H7FFF DIM A$(1 TO 3) DIM ln(1 TO 3) AS INTEGER DIM cnt(1 TO 3) AS INTEGER FOR i% = 1 TO 3 IF LOF(i%) = 0 THEN ln(i%) = highval ELSE ln(i%) = -1 NEXT MIN% = -2
DO LMIN% = MIN% MIN% = highval: indx% = 0 FOR i% = 1 TO 3 IF ln(i%) < MIN% THEN MIN% = ln(i%): indx% = i% END IF NEXT IF MIN% = highval THEN EXIT DO ELSE IF MIN% < LMIN% THEN PRINT USING "FILE _## OUT OF ORDER AT POS ## \ \"; indx%; cnt(indx%); A$(indx%) CLOSE KILL "DDD.TXT" END END IF IF MIN% >= 0 THEN PRINT #4, A$(indx%) PRINT A$(indx%) END IF DO IF EOF(indx%) THEN ln(indx%) = highval: EXIT DO ELSE LINE INPUT #indx%, A$(indx%) IF LEN(A$(indx%)) = 0 THEN PRINT USING "Skipping blank line after record _#### in file #"; cnt(indx%); indx% GOTO skipblank END IF cnt(indx%) = cnt(indx%) + 1 ln(indx%) = VAL(LEFT$(A$(indx%), 4)) END IF skipblank: LOOP UNTIL LEN(A$(indx%)) END IF LOOP PRINT PRINT USING "#### records of file 1 "; cnt(1) PRINT USING "#### records of file 2 "; cnt(2) PRINT USING "#### records of file 3 "; cnt(3) PRINT "Have been sorted" CLOSE END
FILERR: ERRATA = ERR: RESUME NEXT
|
|
|
Logged
|
Antoni
|
|
|
Moneo
|
 |
« Reply #5 on: July 11, 2003, 10:12:10 PM » |
|
Excellent Antoni. I gave it a good test with all 7 posible matching conditions, and it works 100%.
Too bad I can't understand your code, as usual. I'm surprised you didn't manage to use recursion. *****
|
|
|
Logged
|
|
|
|
Antoni Gual
|
 |
« Reply #6 on: July 12, 2003, 05:11:47 AM » |
|
No it' does not use recursion, but it has a GOTO, that's even worse. :evil:
|
|
|
Logged
|
Antoni
|
|
|
Moneo
|
 |
« Reply #7 on: July 12, 2003, 08:25:38 PM » |
|
ANTONI, This is your original code which contains the GOTO. DO IF EOF(indx%) THEN ln(indx%) = highval: EXIT DO ELSE LINE INPUT #indx%, A$(indx%) IF LEN(A$(indx%)) = 0 THEN PRINT USING "Skipping blank line after record _#### in file #"; cnt(indx%); indx% GOTO skipblank END IF cnt(indx%) = cnt(indx%) + 1 ln(indx%) = VAL(LEFT$(A$(indx%), 4)) END IF skipblank: LOOP UNTIL LEN(A$(indx%))
I think it can be structured as follows to avoid using GOTO. DO IF EOF(indx%) THEN ln(indx%) = highval: EXIT DO ELSE LINE INPUT #indx%, A$(indx%) IF LEN(A$(indx%)) = 0 THEN PRINT USING "Skipping blank line after record _#### in file #"; cnt(indx%); indx% ELSE cnt(indx%) = cnt(indx%) + 1 ln(indx%) = VAL(LEFT$(A$(indx%), 4)) END IF END IF LOOP UNTIL LEN(A$(indx%))
Won't this work? *****
|
|
|
Logged
|
|
|
|
Moneo
|
 |
« Reply #8 on: July 19, 2003, 06:24:52 PM » |
|
Hey guys, This challenge has been open for 2 weeks and we only had one submitted solution, an excellent piece of code by Antoni, which complies with all the specifications and works 100%.
The imlementation of this solution is considered a classic programming problem. You all should take a crack at it to appreciate the subtleties. You will definitely learn by doing.
I'm going to establish a deadline of 1 week from today, 26-Jul-2003 23:59 GMT for additional solutions, otherwise Antoni will be the winner. *****
|
|
|
Logged
|
|
|
|
|