Use more fortran intrinsics#222
Merged
Merged
Conversation
…ran intrinsics. Change several loop based assignments to use fortran array slicing. Every assignment that involves an allocatable is explicitly sliced to the same size as the loop it's replacing.
Contributor
Author
|
Restructured the commits into initialization and matmul/sliced assignment, dropped the IF reordering and winxp memory limit change, those should be separate from this PR. Also added a few more matmul and initialization updates in the files already modified. I also double checked assignments on allocatables, I believe all now explicitly slice the array to the same size as the previous loop method, this includes any matmuls that were implicitly sliced before. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This changes a bunch of loop based array initializations to use fortran array assignments, which has 2 benefits, less clutter in the codebase, and the compiler better optimizes these. This also changes several calls of MATMULT_FFF and the transpose version to use the fortran intrinsics MATMUL and TRANSPOSE. This change is worth roughly a 10 second improvement on the big plate benchmark problem on the two machines I tested on.
There's a lot more places that use the loop based initialization. Same with MATMULT_FFF, I purposefully kept that narrow on areas I could test because there are places where you need to think a bit to keep the behavior the same, ie if the arrays aren't trivially broadcastable.
Edit: This should go in a separate PR:
The other main change is to disable the windows-XP era array size limit (~2GB) by setting it to 100TB. For some users/problems this may expose a preexisting issue that was masked by this limit: mystran overallocates the stiffness matrix to be the maximum possible size based on the number of elements times the number of elements in their dense stiffness matrices, then after construction it frees the unused memory. It was pretty easy to hit this limit, so mystran would allocate less than "intended" but end up with a global stiffness matrix much smaller than even the artificially low limit. Now mystran will allocate the full amount based on the conservative estimate. This is only a practical concern now due to the recent performance improvements, so it seems unlikely to impact many end users. The fix here is either a better bounding estimate or some sort of reallocation strategy with a less conservative estimate.