home bbs files messages ]

Forums before death by AOL, social media and spammers... "We can't have nice things"

   comp.arch      Apparently more than just beeps & boops      131,241 messages   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]

   Message 129,448 of 131,241   
   Michael S to Terje Mathisen   
   Re: 3-way long addition (2/2)   
   20 Aug 25 14:16:55   
   
   [continued from previous message]   
      
     # ymm1[0:3] = c3[4:7]   
     # ymm2[0:3] = iSum3[0:3]   
     # ymm3[0:3] = iSum3[4:7]   
     # ymm5[0]   = carry   
     # output:   
     # ymm0[0:3] = iSum2[0:3]   
     # ymm1[0:3] = iSum2[4:7]   
     # ymm2[0:3] = cSum [0:3]   
     # ymm3[0:3] = cSum [4:7]   
     # ymm5[0]   = carry   
     # scratch: ymm4   
     vpermq   $0x93,      %ymm0, %ymm4   
      # ymm4[0:3] = c3[3,0,1,2]   
     vmovdqa  %ymm2,      %ymm0   
      # ymm0[0:3] = iSum2[0:3] = iSum3[0:3]   
     vpermq   $0x93,      %ymm1, %ymm2   
      # ymm2[0:3] = c3[7,4,5,6]   
     vpaddq   %xmm2,      %xmm5, %xmm5   
      # ymm5[0]   = carry += c3[7]   
     vmovdqa  %ymm3,      %ymm1   
      # ymm1[0:3] = iSum2[4:7] = iSum3[4:7]   
     vpblendd $3, %ymm4,  %ymm2, %ymm3   
      # ymm3[0:3] = cSum[4:7] = { c3[3], c3[4,5,6] }   
     vpxor    %xmm2,      %xmm2, %xmm2   
      # ymm2[0:3] = 0   
     vpblendd $3, %ymm2,  %ymm4, %ymm2   
      # ymm2[0:3] = cSum[0:3] = { 0, c3[0,1,2] }   
     jmp .add_carry   
      
   .prop_carry2:   
     # input:   
     # ymm0[0:3] = c3[0:3]   
     # ymm1[0:3] = c3[4:7]   
     # ymm2[0:3] = iSum3[0:3]   
     # ymm3[0:3] = iSum3[4:7]   
     # ymm5[0]   = carry   
     # output:   
     # ymm0[0:3] = iSum2[0:3]   
     # ymm1[0:3] = iSum2[4:7]   
     # ymm2[0:3] = cSum [0:3]   
     # ymm3[0:3] = cSum [4:7]   
     # ymm5[0]   = carry   
     # scratch: ymm4   
     vpermq   $0x93,      %ymm0, %ymm4   
      # ymm4[0:3] = c3[3,0,1,2]   
     vmovdqa  %ymm2,      %ymm0   
      # ymm0[0:3] = iSum2[0:3] = iSum3[0:3]   
     vpermq   $0x93,      %ymm1, %ymm2   
      # ymm2[0:3] = c3[7,4,5,6]   
     vmovdqa  %ymm3,      %ymm1   
      # ymm1[0:3] = iSum2[4:7] = iSum3[4:7]   
     vpblendd $3, %ymm4,  %ymm2, %ymm3   
      # ymm3[0:3] = cSum[4:7] = { c3[3], c3[4,5,6] }   
     vpxor    %xmm2,      %xmm2, %xmm2   
      # ymm2[0:3] = 0   
     vpblendd $3, %ymm2,  %ymm4, %ymm2   
      # ymm2[0:3] = cSum[0:3] = { 0, c3[0,1,2] }   
     jmp .add_carry2   
      
   .seh_endproc   
      
   AVX2 is rather poorly suited for this task - it lacks unsigned   
   comparison instructions, so the first input should be shifted by   
   half-range at the beginning and the result should be shifted back.   
      
   AVX-512 can be more suitable. But the only AVX-512 capable CPU that I   
   have access to is miniPC with cheap and slow core-i3 used by family   
   members almost exclusively for viewing movies. It does not even have   
   minimal programming environments installed.   
      
   --- SoupGate-Win32 v1.05   
    * Origin: you cannot sedate... all the things you hate (1:229/2)   

[   << oldest   |   < older   |   list   |   newer >   |   newest >>   ]


(c) 1994,  bbs@darkrealms.ca