Forums before death by AOL, social media and spammers... "We can't have nice things"
|    comp.arch    |    Apparently more than just beeps & boops    |    131,241 messages    |
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
|    Message 129,448 of 131,241    |
|    Michael S to Terje Mathisen    |
|    Re: 3-way long addition (2/2)    |
|    20 Aug 25 14:16:55    |
   
   [continued from previous message]   
      
    # ymm1[0:3] = c3[4:7]   
    # ymm2[0:3] = iSum3[0:3]   
    # ymm3[0:3] = iSum3[4:7]   
    # ymm5[0] = carry   
    # output:   
    # ymm0[0:3] = iSum2[0:3]   
    # ymm1[0:3] = iSum2[4:7]   
    # ymm2[0:3] = cSum [0:3]   
    # ymm3[0:3] = cSum [4:7]   
    # ymm5[0] = carry   
    # scratch: ymm4   
    vpermq $0x93, %ymm0, %ymm4   
    # ymm4[0:3] = c3[3,0,1,2]   
    vmovdqa %ymm2, %ymm0   
    # ymm0[0:3] = iSum2[0:3] = iSum3[0:3]   
    vpermq $0x93, %ymm1, %ymm2   
    # ymm2[0:3] = c3[7,4,5,6]   
    vpaddq %xmm2, %xmm5, %xmm5   
    # ymm5[0] = carry += c3[7]   
    vmovdqa %ymm3, %ymm1   
    # ymm1[0:3] = iSum2[4:7] = iSum3[4:7]   
    vpblendd $3, %ymm4, %ymm2, %ymm3   
    # ymm3[0:3] = cSum[4:7] = { c3[3], c3[4,5,6] }   
    vpxor %xmm2, %xmm2, %xmm2   
    # ymm2[0:3] = 0   
    vpblendd $3, %ymm2, %ymm4, %ymm2   
    # ymm2[0:3] = cSum[0:3] = { 0, c3[0,1,2] }   
    jmp .add_carry   
      
   .prop_carry2:   
    # input:   
    # ymm0[0:3] = c3[0:3]   
    # ymm1[0:3] = c3[4:7]   
    # ymm2[0:3] = iSum3[0:3]   
    # ymm3[0:3] = iSum3[4:7]   
    # ymm5[0] = carry   
    # output:   
    # ymm0[0:3] = iSum2[0:3]   
    # ymm1[0:3] = iSum2[4:7]   
    # ymm2[0:3] = cSum [0:3]   
    # ymm3[0:3] = cSum [4:7]   
    # ymm5[0] = carry   
    # scratch: ymm4   
    vpermq $0x93, %ymm0, %ymm4   
    # ymm4[0:3] = c3[3,0,1,2]   
    vmovdqa %ymm2, %ymm0   
    # ymm0[0:3] = iSum2[0:3] = iSum3[0:3]   
    vpermq $0x93, %ymm1, %ymm2   
    # ymm2[0:3] = c3[7,4,5,6]   
    vmovdqa %ymm3, %ymm1   
    # ymm1[0:3] = iSum2[4:7] = iSum3[4:7]   
    vpblendd $3, %ymm4, %ymm2, %ymm3   
    # ymm3[0:3] = cSum[4:7] = { c3[3], c3[4,5,6] }   
    vpxor %xmm2, %xmm2, %xmm2   
    # ymm2[0:3] = 0   
    vpblendd $3, %ymm2, %ymm4, %ymm2   
    # ymm2[0:3] = cSum[0:3] = { 0, c3[0,1,2] }   
    jmp .add_carry2   
      
   .seh_endproc   
      
   AVX2 is rather poorly suited for this task - it lacks unsigned   
   comparison instructions, so the first input should be shifted by   
   half-range at the beginning and the result should be shifted back.   
      
   AVX-512 can be more suitable. But the only AVX-512 capable CPU that I   
   have access to is miniPC with cheap and slow core-i3 used by family   
   members almost exclusively for viewing movies. It does not even have   
   minimal programming environments installed.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   
|
[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]
(c) 1994, bbs@darkrealms.ca