Qbasicnews.com

QbasicNews.Com => Challenges => Topic started by: Blitz on April 13, 2003, 09:50:01 AM



Title: Who's up for a real challenge?
Post by: Blitz on April 13, 2003, 09:50:01 AM
Code the fastest transparent blitter you can for a 486. You're not restricted to any rules or languages. Do it as you please. It will run on a 486 DX4 100 mhz with 33 mhz bus. The video chip is an old cirrius logic on a 16 bit, 8 mhz (?) ISA bus.

The following tests will be performed.
Sprite to vram, aligned
Sprite to sram, aligned
Sprite to vram, unaligned
Sprite to sram, unaligned

The code has to be in by 18-04-2003 (friday). My mail is blitz_dotnet@hotmail.com


Title: Who's up for a real challenge?
Post by: Blitz on April 13, 2003, 09:54:38 AM
Oh yeah, the arguments are passed basic style.

Blit ( seg dst as any, byval x as integer, y as integer, seg src as any )
Blit ( byval dst as long, byval x as integer, y as integer, seg src as any )

Where dst is pointer to 64000 bytes, either vram or a backbuffer. And src is qb style sprite where first byte 0-1 is width*8 and byte 2-3 is height.

The transparent color is 0.


Title: Who's up for a real challenge?
Post by: Kackurot on April 13, 2003, 04:37:53 PM
Can RLE be used? compiled?


Title: Who's up for a real challenge?
Post by: Blitz on April 13, 2003, 05:58:50 PM
no


Title: Who's up for a real challenge?
Post by: toonski84 on April 14, 2003, 03:26:52 AM
this isnt just a made-up contest so you can enter ugl and win, is it blitzachinchilla? ;)


Title: Who's up for a real challenge?
Post by: Blitz on April 14, 2003, 06:36:56 AM
heh, no. I don't have to cheat to win. Not that i'm saying that i'm going to win. But i trust my skills.


Title: Who's up for a real challenge?
Post by: relsoft on April 16, 2003, 02:17:45 AM
Does it have clipping?


Title: Who's up for a real challenge?
Post by: Blitz on April 16, 2003, 06:17:14 AM
Yes, but if you don't want to. It doesn't matter, clipping doesn't affect speed at all.


Title: Who's up for a real challenge?
Post by: Fling-master on April 17, 2003, 11:13:58 PM
Depends how you implement it... if done correctly, then no. But if done the "lazy" way (which IMO is the hard way) then it does affect speed.

... hrm... yet again I'm posting useless garbage... meh.


Title: Who's up for a real challenge?
Post by: Blitz on April 17, 2003, 11:45:37 PM
It's pretty obvious to me that clipping is done at object level. If you're not doing it at object level then you still have allot to learn and challenges isn't really what they should be doing now is it?


Title: Who's up for a real challenge?
Post by: Blitz on April 18, 2003, 06:42:28 PM
The contest is over, noone enterd. You guys suck. :P


Title: Who's up for a real challenge?
Post by: Ninkazu on April 18, 2003, 06:45:53 PM
Quote from: "Blitz"
The contest is over, noone enterd. You guys suck. :P


BAHAHA! :lol:


Title: Who's up for a real challenge?
Post by: Fling-master on April 19, 2003, 11:25:05 AM
:rotfl:


Title: Who's up for a real challenge?
Post by: relsoft on April 21, 2003, 01:49:15 AM
Cya tomorrow. I'll post something...  Just that I was on vacation for a little R&R. :*)


Title: Who's up for a real challenge?
Post by: relsoft on April 22, 2003, 02:47:28 AM
Here:

Notes. Your BX indexing technique is slow. ;*) as compared to a di/si inc.

PutF uses your BX indexing technique and PutF2 uses the standard Inc Si/Di.

Code:

MODEL MEDIUM,BASIC
.STACK 30H
.386
.CODE

;SUB RelPutF(BYVAL DESTSEG%,BYVAL X%,BYVAL Y%,BYVAL SPRITESEGMENT%,BYVAL SPRITEOFFSET%)
;STACK
;DEST SEG       =14
;X              =12  
;Y              =10
;SPRITESEGMENT  =8
;SPRITOFFSET    =6
;RET SEG        =4
;RET OFF        =2
;BP             =0
;ds             =-2     ;REAL WIDTH
;320-WIDTH      =-4     ;FOR SPEEDIER LOOP NO SUB

;DS:SI                  =SPRITE SEG:SPRITE OFF
;ES:DI                  =DEST SEG:DEST OFF

align 2

PUBLIC RelPutF
RelPutF PROC

PUSH BP
mov bp,sp
sub sp,4

mov [bp-2],ds                   ;save ds

mov es,[bp+14]                  ;layer

mov ds,[bp+8]                   ;spr seg
mov si,[bp+6]                   ;spr off

mov dx,[si]                     ;wid
mov ax,[si+2]                   ;height

add si,4

shr dx,3                        ;wid\8

                                ;calc offset

mov cx,[bp+10]                  ;y
xchg ch,cl                      ;Y*256
mov di,cx                       ;save
shr di,2                        ;Y*64
add di,cx                       ;Y*64+Y*256=320
add di,[bp+12]                  ;Offset=Y*320+X

;notes:
;Al = Hieght
;dx = Wid
;bx = free
;cx = free

mov cx,320
sub cx,dx
mov [bp-4],cx                   ;save to stack

Yloop:
    xor bx,bx
    mov cx,dx                   ;save wid

Xloop:

    mov ah,[si+bx]
    or ah,ah

    jz skip
        mov es:[di+bx],ah
    skip:

    inc bx
    dec cx

jnz Xloop

    add si,bx
    add bx,[bp-4]
    add di,bx
    dec al
jnz Yloop

mov ds,[bp-2]
ADD SP,4
POP BP                          ;RESTORE BP

RET 10

RelPutF ENDP

END    





Here's the one which is faster:

Code:


MODEL MEDIUM,BASIC
.STACK 30H
.386
.CODE

;SUB RelPutF2(BYVAL DESTSEG%,BYVAL X%,BYVAL Y%,BYVAL SPRITESEGMENT%,BYVAL SPRITEOFFSET%)
;STACK
;DEST SEG       =14
;X              =12  
;Y              =10
;SPRITESEGMENT  =8
;SPRITOFFSET    =6
;RET SEG        =4
;RET OFF        =2
;BP             =0
;ds             =-2     ;REAL WIDTH
;320-WIDTH      =-4     ;FOR SPEEDIER LOOP NO SUB

;DS:SI                  =SPRITE SEG:SPRITE OFF
;ES:DI                  =DEST SEG:DEST OFF

align 2

PUBLIC RelPutF2
RelPutF2 PROC

PUSH BP
mov bp,sp
sub sp,4

mov [bp-2],ds                   ;save ds

mov es,[bp+14]                  ;layer

mov ds,[bp+8]                   ;spr seg
mov si,[bp+6]                   ;spr off

mov dx,[si]                     ;wid
mov bx,[si+2]                   ;height

add si,4

shr dx,3                        ;wid\8

                                ;calc offset

mov cx,[bp+10]                  ;y
xchg ch,cl                      ;Y*256
mov di,cx                       ;save
shr di,2                        ;Y*64
add di,cx                       ;Y*64+Y*256=320
add di,[bp+12]                  ;Offset=Y*320+X

;notes:
;Ax = free
;dx = Wid
;bx = Height
;cx = free

mov cx,320
sub cx,dx
mov [bp-4],cx                   ;save to stack

Yloop:
    mov cx,dx                   ;save wid

Xloop:

    mov al,[si]
    inc si

    or al,al
    jz skip
        mov es:[di],al
    skip:

    inc di
    dec cx

jnz Xloop
    add di,[bp-4]
    dec bx
jnz Yloop

mov ds,[bp-2]
ADD SP,4
POP BP                          ;RESTORE BP

RET 10

RelPutF2 ENDP

END




To test:

Code:


DECLARE SUB RelPutF (BYVAL DESTSEG%, BYVAL X%, BYVAL Y%, BYVAL SPRITESEGMENT%, BYVAL SPRITEOFFSET%)
DECLARE SUB RelPutF2 (BYVAL DESTSEG%, BYVAL X%, BYVAL Y%, BYVAL SPRITESEGMENT%, BYVAL SPRITEOFFSET%)

DEFINT A-Z

DIM Vpage(31999) AS INTEGER

'Test Vars
DIM PutTestMem  AS SINGLE
DIM PutTestMem2 AS SINGLE
DIM PutTestVid  AS SINGLE
DIM PutTestVid2 AS SINGLE


CLS
SCREEN 13
W = 15
H = 15



Layer = VARSEG(Vpage(0))
Size = ((W + 1) * (H + 1) + 4) \ 2
DIM Array(Size) AS INTEGER
FOR I = 0 TO H
  FOR J = 0 TO W
        PSET (J, I), (I OR J) + 150
  NEXT
NEXT

GET (0, 0)-(W, H), Array

SprSeg = VARSEG(Array(0))
SprOff = VARPTR(Array(0))
TIMER ON
    T# = TIMER
    FOR n& = 1 TO 500000
      RelPutF Layer, 100, 100, SprSeg, SprOff
    NEXT
    PutTestMem = n& / (TIMER - T#)

    T# = TIMER
    FOR n& = 1 TO 500000
      RelPutF2 Layer, 100, 100, SprSeg, SprOff
    NEXT
    PutTestMem2 = n& / (TIMER - T#)

    'VIDEO
    T# = TIMER
    FOR n& = 1 TO 500000
      RelPutF &HA000, 100, 100, SprSeg, SprOff
    NEXT
    PutTestVid = n& / (TIMER - T#)

    T# = TIMER
    FOR n& = 1 TO 500000
      RelPutF2 &HA000, 100, 100, SprSeg, SprOff
    NEXT
    PutTestVid2 = n& / (TIMER - T#)


    CLS
    SCREEN 0
    WIDTH 80

    PRINT STR$(PutTestMem) + " For PutF aka [Si/di+bx] trick (Base mem)"
    PRINT STR$(PutTestMem2) + " For PutF2 aka inc si/di (Base mem)"
    PRINT STR$(PutTestVid) + " For PutF aka [Si/di+bx] trick  (Video mem)"
    PRINT STR$(PutTestVid2) + " For PutF2 aka inc si/di (Video mem)"

    C$ = INPUT$(1)

CLS
SCREEN 0
END



Title: Who's up for a real challenge?
Post by: Blitz on April 22, 2003, 12:43:18 PM
i never said i was going to use it, pointless opn 486. Since noone enterd i didn't bother myself either. So i never finished it, doesn't align to vram. But beat this. It's faster then ugl on 486.

Code:

            .model medium, basic            
            .386
            .data
innerloop   word        ?
            .code
           
Blit486     proc    public uses bx di si ds es,\
                    pdst:far ptr, x:word, y:word, psrc:far ptr

            local   dstXres:word, dstYres:word                    
            local   srcXres:word, srcYres:word
           
            ;;
            ;; es:di -> dst
            ;; ds:si -> src
            ;;
            lds     si, psrc
            les     di, pdst            

            cmp     word ptr [pdst+2], 0a000h
            jne     @@sramloop
            mov     innerloop, offset vram_inner
            jmp     @@conta
@@sramloop: mov     innerloop, offset sram_inner
           

@@conta:            
            ;;
            ;; Get destination width and height
            ;;
            mov     dstYres, 200
            mov     dstXres, 320
           
            ;;
            ;; Get source width and height
            ;;
            mov     ax, ds:[si]
            mov     cx, ds:[si+2]
            shr     ax, 3
            mov     srcYres, cx
            mov     srcXres, ax
           
            ;;
            ;; Setup destination adress
            ;;
            mov     ax, y
            mul     dstXres
            add     di, x
            add     di, ax            
            add     si, 4
           
            mov     bx, srcXres
            mov     cx, srcYres
           
            or      bx, bx
            jz      @@exit
            or      cx, cx
            jz      @@exit            
           
           
@@oloop:    call    innerloop
            add     si, srcXres
            add     di, dstXres
            dec     cx
            jnz     @@oloop

@@exit:     ret
Blit486     endp
                   
           

           
           
;;
;; bx -> pixel count
;; ds:si -> source
;; es:di -> destination
;;
sram_inner  proc    near private uses ax bx cx dx di si bp
           
            ;;
            ;; fs -> single loop count
            ;; gs -> quad loop count
            ;;
            xor     ax, ax
            mov     fs, bx            
            mov     gs, ax
            cmp     bx, 8
            jl      @@bloop_s
           
            ;;
            ;; Quad pixels
            ;;            
            mov     bp, bx
            shr     bp, 2
            shl     bp, 2
            mov     gs, bp
           
            ;;
            ;; Single pixels
            ;;
            and     bx, 3
            jz      @@qloop_s
            mov     fs, bx            
           
           
            ;;
            ;; Single pixel loop
            ;;            
@@bloop_s:  mov    bp, fs
            add    si, bp
            add    di, bp
            neg    bp            
           
@@bloop_i:  mov    cl, ds:[si+bp]  ;; 1
            mov    dl, es:[di+bp]  ;; 2

            ;;
            ;; Create mask
            ;;
            cmp    cl, 1           ;; 3
            sbb    al, al          ;; 4
           
            ;;
            ;; Combine pixels
            ;;            
            and    al, dl          ;; 5
            or     al, cl          ;; 6
           
            mov    es:[di+bp], al  ;; 7
           
            ;;
            ;; Pixels left ?
            ;;
            inc    bp              ;; 8
            jnz    @@bloop_i       ;; 8-11
           
           
            ;;
            ;; Quad pixel loop
            ;;
@@qloop_s:  mov    bp, gs
            or     bp, bp
            jz     @@exit
           
            add    si, bp
            add    di, bp
            neg    bp            
           
@@qloop_i:  mov    ecx, ds:[si+bp] ;; 1
            mov    edx, es:[di+bp] ;; 2

            ;; ecx -> source
            ;; edx -> destination            
            mov    ebx, ecx        ;; 3
            shr    ebx, 16         ;; 3-5
           
            ;;
            ;; Create mask
            ;;
            cmp    bl, 1           ;; 6
            sbb    al, al          ;; 7
            cmp    bh, 1           ;; 8
            sbb    ah, ah          ;; 9
            shl    eax, 16         ;; 9-11
            cmp    cl, 1           ;; 12
            sbb    al, al          ;; 13
            cmp    ch, 1           ;; 14
            sbb    ah, ah          ;; 15
           
            ;;
            ;; Combine source and destination
            ;;            
            and    eax, edx        ;; 16
            or     eax, ecx        ;; 17
            mov    es:[di+bp], eax ;; 18 (Hopefully)
           
            ;;
            ;; More pixels?
            ;;
            add    bp, 4           ;; 19
            jnz    @@qloop_i       ;; 19-22
                       
@@exit:     ret
sram_inner  endp


;;
;; bx -> pixel count
;; ds:si -> source
;; es:di -> destination
;;
vram_inner  proc    near private uses ax bx cx di si bp
            jmp     @@begin
vramTB      word    @@cont, @@casea, @@caseb, @@casec

@@begin:    mov     bp, bx
            add     si, bx
            add     di, bx
            neg     bp

@@loop:     mov     ax, ds:[si+bp]

            or      al, al
            setnz   bl
            or      ah, ah
            setnz   bh
           
            shl     bl, 1
            shl     bh, 2
            or      bl, bh
            xor     bh, bh
            jmp     [vramTB+bx]
           
@@casea:    mov     es:[di+bp+0], al
            jmp     @@cont
@@caseb:    mov     es:[di+bp+1], ah
            jmp     @@cont
@@casec:    mov     es:[di+bp+0], ax
           
@@cont:     add     bp, 2
            jnz     @@loop

@@exit:     ret
vram_inner  endp
            end


Title: Who's up for a real challenge?
Post by: Blitz on April 22, 2003, 12:53:55 PM
Oh, and it's.
Code:

        add     si, bx
        add     di, bx
        neg     bx
@@iloop:
        mov     al, ds:[si+bx]
        or      al, al
        jz      @@skip
        mov     es:[di+bx], al
@@skip: inc     bx
        jnz     @@iloop


Title: Who's up for a real challenge?
Post by: Blitz on April 25, 2003, 09:12:10 AM
No comments rel? That's a first :P


Title: Who's up for a real challenge?
Post by: Agamemnus on April 25, 2003, 06:07:41 PM
I'd post something but this whole thread isn't in English...


Title: Who's up for a real challenge?
Post by: LooseCaboose on April 26, 2003, 02:02:48 AM
I just glanced at this thread so I could be way off, but wouldnt something like the following be sufficient:

Code:

void blit(void *dst, int x, int y, void *src) {
  int i, width, height;

  /*
   * Work out the height and width and offset
   * dst correctly. Cant be bothered.
   */

  for(i = 0; i < width * height; i++) {
    /* Gives transparent colour = 0 */
    if(*src) {
      /* Copy */
      *dst++ = *src++;
    }
  }
}


Much nicer than programming in assembly, plus its completely portable and should give reasonably good object good if you use a good compiler (gcc -O3).


Title: Who's up for a real challenge?
Post by: Blitz on April 26, 2003, 05:02:15 AM
Well, sure it would work, if yyou changed the loop a little. But that wouldn't give 1/10 the performance of my loops.


Title: Who's up for a real challenge?
Post by: relsoft on April 27, 2003, 03:12:13 AM
Blitz: How much speed difference? Which was faster?  Man, that masking thingy is kool!!!!


Title: Who's up for a real challenge?
Post by: Blitz on April 27, 2003, 06:07:07 AM
The sram loop is allot faster then ugls 386 loop. And the vram loop is basically the same.


Title: Who's up for a real challenge?
Post by: LooseCaboose on April 27, 2003, 11:54:43 PM
Like I said it would depend on the compiler and the optomizations flags used, but a hand written asm version would probably still beat it, I could make it slightly faster with register keywords on the variables and some more tidying.

I just like the C version better, because its far more portable and flexible. Coding graphics in asm is okay for a hobby, but most new graphics cards have hardware blitters (and most other functions too) so learning how to write one in asm for a 486 does seem kind of pointless to me, however as always you have churned out some truely impressive code :lol:


Title: Who's up for a real challenge?
Post by: Blitz on April 30, 2003, 05:08:42 AM
Good asm code is an art. And in the game industry, it always has been and will be very important.


Title: Who's up for a real challenge?
Post by: Blitz on April 30, 2003, 05:18:14 AM
Also, knowing asm is very useful even when using high level languages. You'd think compilers generate stable code, but that's not always the case with for instance watcom or borland. By dissasembling it i can see exactly what it does and find out where the bug is.

Or when i'm trying to get a certain loop to run as fast as possible, i usally tinker with the loop and dissassemble it until i find a code the makes the compiler generate as good code as it can (which always suxs pretty much).

Knowing asm makes a better high level coder as well, since you know how the computer works you know how to code optimal code.

So i don't agree, it will always be important. There's a huge difference between knowing asm and being good at it. DQB is the perfect example of why some people should just stick to compilers.


Title: Who's up for a real challenge?
Post by: Blitz on May 20, 2005, 06:19:43 AM
So who won?


Title: Who's up for a real challenge?
Post by: relsoft on May 20, 2005, 07:42:56 AM
Digs up his ancient plaque and gives it to Blitz....
 :D


Title: Who's up for a real challenge?
Post by: Blitz on May 20, 2005, 08:33:59 AM
Looking through old posts, this place was much more fun 2 years ago.


Title: Who's up for a real challenge?
Post by: KiZ on May 20, 2005, 05:47:23 PM
Get over it already.

Most of the pros left.

Most of the new pros are into non-demo related stuff. Its just the way it develops. Demo, GFX and game programming arent real big anymore. There is no point in creating high speed blitters or any of that stuff, it would be reinventing the wheel.

Stop bitching because you feel nostalgic looking at this thread.

(Bitching might have been to harsh a word, sorry.)


Title: Who's up for a real challenge?
Post by: Blitz on May 20, 2005, 05:58:58 PM
I advise you to do the same. No one is bitching nor whining here except you at the moment. So why don't you drop the whole hostility act and play nice.


Title: Who's up for a real challenge?
Post by: j2krei08 on May 21, 2005, 03:49:34 PM
Dang you, RADIUM-V...