CUDA-GEMM-kernel My attempt of making of a GEMM kernel... GOT ABSOLUTELY DESTROYED BY CuBLAS while testing it on 4060ti (38TFLOPS as compared to CuBLAS's 40TFLOPS)