improve Simpleupdate performances by ogauthe · Pull Request #361 · QuantumKitHub/PEPSKit.jl

ogauthe · 2026-04-16T16:07:38Z

This is a work in progress to improve SimpleUpdate performances.

Context: SimpleUpdate is quite slow, especially for 2nd neighbor interaction. I had cases where it became the bottleneck, above CTMRG. I started working on improving the performances last week. I realized many parts could be ported upstream.

I followed 4 different strategies:

improving type stability
removing copies where I think they could be avoided
removing intermediate normalization
removing permutations by simplifying perm1∘ perm2

This is a work in progress. It already shows significant speed-up for e.g. finite temperature J1-J2 with SU(2) symmetry. I know it is possible to do much better. I just saw #360 which has overlap for _get_cluster_trunc and I realized it may be relevant to mention my work (objective is not to block any part of #360)

codecov · 2026-04-16T16:34:39Z

Codecov Report

❌ Patch coverage is 93.63296% with 17 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/algorithms/time_evolution/get_cluster.jl	83.78%	12 Missing ⚠️
src/algorithms/time_evolution/simpleupdate.jl	97.10%	2 Missing ⚠️
src/algorithms/time_evolution/simpleupdate3site.jl	88.23%	2 Missing ⚠️
src/utility/util.jl	83.33%	1 Missing ⚠️

Files with missing lines	Coverage Δ
src/algorithms/contractions/absorb_weight.jl	`100.00% <100.00%> (ø)`
src/algorithms/contractions/bondenv/als_solve.jl	`84.90% <ø> (-11.46%)`	⬇️
src/algorithms/contractions/bondenv/benv_ctm.jl	`100.00% <100.00%> (ø)`
src/algorithms/contractions/bondenv/benv_tools.jl	`50.00% <ø> (ø)`
src/algorithms/contractions/bondenv/gaugefix.jl	`100.00% <100.00%> (ø)`
src/algorithms/time_evolution/apply_gate.jl	`85.00% <100.00%> (-7.86%)`	⬇️
src/algorithms/time_evolution/apply_mpo.jl	`99.20% <100.00%> (-0.80%)`	⬇️
src/algorithms/time_evolution/time_evolve.jl	`89.18% <ø> (-2.71%)`	⬇️
src/algorithms/truncation/bond_tensor.jl	`100.00% <100.00%> (ø)`
src/algorithms/truncation/bond_truncation.jl	`93.67% <100.00%> (-1.21%)`	⬇️
... and 5 more

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Yue-Zhengyuan

Thank you for improving SU! Left some initial comments.

Yue-Zhengyuan · 2026-04-17T01:02:41Z

            (d, r, c), = _nn_bondrev(sites..., (Nr, Nc))
            alg.bipartite && r > 1 && continue
-            ϵ′ = _su_iter!(state2, gate, env2, sites, alg)
+            ϵ′ = _su_iter_gate!(state2, gate, env2, sites[1], sites[2], alg)


Here I wanted _su_iter! to be able to handle both AbstractTensorMap{E, S, 2, 2} and length-2 MPO gates, though this is mainly for test purposes to ensure that MPO gates are applied correctly even for fermions. Is it possible to recover this behavior without loss of performance?

Oh I now see the point of using dispatch. There is a traedoff here between code readibility/efficiency (gate is much faster) and this test feature. Is this something that is currently tested?

[copy from above] In su_iter, we need an if statement to distinguish nearest neighbor from long distance at run time. Therefore my idea was that dispatch does not bring much here, while using different function names makes it explicit and easier to find which function/method is called.

Is this something that is currently tested?

It is tested with Hubbard model in test/timeevol/cluster_projectors.jl.

gate is much faster

The MPO way can still be optimized. When applying the gate MPO on the first and the last cluster sites, we can also use the trick of reduced bond tensors, which has not been done yet. Even for longer-range (e.g. NNN) bonds, this can still be useful. Separating out two unitary tensors at the two ends of an OBC-MPO will not affect finding the Vidal gauge.

We can try this out in a follow-up PR if you prefer not to do too much here.

Ok I will revert to dispatch. Keeping this conversation, closing the other relative to this one.

Yue-Zhengyuan · 2026-04-19T07:43:00Z

There is another recent change that further slowed down SU for 2nd neighbor interaction. After introducing LocalCircuit in #347 (and its predecessor TrotterGates in #339), each term in the Hamiltonian is individually exponentiated to a gate in the LocalCircuit. Nearest neighbor and next-nearest neighbor terms are no longer grouped together when exponentiating, resulting in an increase in the number of Trotter gates to be applied.

lkdvos

Some small comments, will try and actually look into this in detail next week :)

Yue-Zhengyuan · 2026-05-01T01:48:42Z

    gate_axs = alg.purified ? (1:1) : (1:2)
-    for gate_ax in gate_axs
-        X, a, b, Y = _qr_bond(A, B; gate_ax, positive = true)
+    for gate_ax in gate_axs  # TODO try to use type stable helper function


Will it be better if I always do for gate_ax in 1:2, and manually break after the first loop when alg.purified === true?

Yue-Zhengyuan · 2026-05-01T04:09:52Z

+    for (site, vertex, open_vaxs) in ((siteA, A, open_vaxs_A), (siteB, B, open_vaxs_B))
+        s′ = (mod1(site[1], Nr), mod1(site[2], Nc))
+        rotated = _bond_rotation(vertex, dir, rev; inv = true)
+        state[s′...] = absorb_weight(rotated, env, s′..., open_vaxs; inv = true)


The state tensors need normalization, especially when evolving for the ground state; otherwise, the norm will explode.

Yue-Zhengyuan · 2026-05-01T08:47:29Z

+    rightpermuted = permute(last(Ms), right_invperm)
+    state[s′] = absorb_weight(rightpermuted, env, s′, right_vaxs; inv = true)
+
+    for (vertex, s, invperm, vaxs) in zip(Ms[(begin + 1):(end - 1)], sites[(begin + 1):(end - 1)], map(t -> t[3], mids), map(t -> t[2], mids))


@ogauthe Is it really that important to separate out the first and last tensors for type stability because they have 3 instead of 2 open virtual legs?

Can I mitigate the problem by wrapping contents in the for-loop as a function?

I think the speedup for the J1-J2 test mainly comes from the improvement in applying nearest neighbor gates. In my test, separating out the first and the last tensors of the cluster offers little gain in performance, especially since we currently only have examples with up to 3-site MPO gates. To reduce redundancy, I'll restore the old approach, but with more things wrapped inside dedicated functions.

I think indeed a function barrier can mitigate some of it, but it will still give a dynamic dispatch. This might completely be fine though, that by itself doesn't cost too much as long as the type instability doesn't leak out of that region and taint the subsequent code

Yue-Zhengyuan · 2026-05-01T08:57:29Z

+    local wts
+    for gate_ax in 1:2
+        _apply_gatempo!(Ms, gates; gate_ax)
+        wts, ϵs = _cluster_truncate!(Ms, truncs)


I just realized that I have recently updated _cluster_truncate! in #348, so it can handle MPSs with any number of physical legs (and even different numbers of physical legs at each site). So no longer need to first fuse the physical legs for PEPO.

Yue-Zhengyuan · 2026-05-01T09:10:09Z

I know it is possible to do much better.

@ogauthe Do you mind briefly outlining what can be further improved starting from the current PR?

ogauthe added 8 commits April 12, 2026 14:23

parametrize SimpleUpdate{Trunc}

a572e07

improve stability

9e0222b

improve tuples stability

c8798e2

remove debug

dd7c58b

remove get_cluster

d2b37af

factorize _su_iter_gate

ada6ba7

type stable absorb_weight

83faeff

WIP typing for _su_iter_mpo

8a5c196

avoid permuting twice in absorb_weight

75cfad4

Yue-Zhengyuan reviewed Apr 17, 2026

View reviewed changes

suggestion

e3dc4ed

Yue-Zhengyuan mentioned this pull request Apr 21, 2026

Fix FixedSpaceTruncation for simple update #360

Merged

ogauthe added 4 commits April 24, 2026 11:03

add verbosity kwarg

23ea6c0

clean su_iter_mpo

63db27c

_su_iter_mpo type stable

b166ef2

runic

2c30b62

Yue-Zhengyuan reviewed Apr 25, 2026

View reviewed changes

Comment thread src/algorithms/time_evolution/simpleupdate.jl

Yue-Zhengyuan reviewed Apr 25, 2026

View reviewed changes

Comment thread src/algorithms/time_evolution/simpleupdate3site.jl Outdated

Comment thread src/environments/suweight.jl

Comment thread test/timeevol/j1j2_finiteT.jl Outdated

lkdvos reviewed Apr 26, 2026

View reviewed changes

Comment thread src/algorithms/time_evolution/simpleupdate3site.jl Outdated

Comment thread src/algorithms/time_evolution/apply_gate.jl Outdated

Comment thread src/algorithms/time_evolution/apply_mpo.jl Outdated

Comment thread src/algorithms/time_evolution/apply_mpo.jl Outdated

ogauthe added 9 commits April 27, 2026 10:31

Merge branch 'master' into simpleupdate

f52f6f1

use TupleTools

aee67d1

explicit circuit as varname

c0f7070

revert unneeded changes

e96c515

uniformize map

3b53deb

avoid explit type annotation

8430286

keep logs in test

6b29aa9

remove redundant spactype/sectortype

650ac76

move absorb_weights to its own file

c431a00

Yue-Zhengyuan added 2 commits April 30, 2026 22:42

Merge remote-tracking branch 'upstream/master' into simpleupdate

784bd74

Small fix

e85cfe6

Yue-Zhengyuan reviewed May 1, 2026

View reviewed changes

Yue-Zhengyuan added 4 commits May 1, 2026 09:51

Reduce diff [skip ci]

05ba457

Fix absorb_weight for fermions [skip ci]

8843419

Restore _su_iter! dispatch and tensor normalization [skip ci]

71479e1

Formatting Project.toml [skip ci]

2802f43

Yue-Zhengyuan reviewed May 1, 2026

View reviewed changes

No longer fuse phys legs for _cluster_truncate! [skip ci]

f76646f

Yue-Zhengyuan reviewed May 1, 2026

View reviewed changes

Yue-Zhengyuan added 7 commits May 1, 2026 18:12

Periodic weight_to_absorb [skip ci]

db96768

Use Iterators.drop instead of 2:end [skip ci]

fc8ca0b

Bring back _get_cluster and break it apart

712c84f

Bring back verbosity setting

5299f71

Require bond tensor to follow MPS axis order

11652f1

Merge remote-tracking branch 'upstream/master' into simpleupdate

02517cc

Clean up

44f2513

Yue-Zhengyuan marked this pull request as ready for review May 2, 2026 12:30

Docstring for _permute_to_last

8219c6b

lkdvos mentioned this pull request May 3, 2026

(feat) Simple update using BP environments #372

Draft

3 tasks

Yue-Zhengyuan added 2 commits May 4, 2026 08:15

Merge remote-tracking branch 'upstream/master' into simpleupdate

af12292

Update Project.toml

d5c01c1

Yue-Zhengyuan requested a review from lkdvos May 4, 2026 00:51

lkdvos approved these changes May 4, 2026

View reviewed changes

Comment thread src/algorithms/contractions/bondenv/benv_ctm.jl Outdated

Yue-Zhengyuan and others added 3 commits May 5, 2026 08:11

Merge remote-tracking branch 'upstream/master' into simpleupdate

0878172

Remove unnecessary type specification

dfa5166

Merge branch 'master' into simpleupdate

912f8ca

lkdvos merged commit c5b94ba into QuantumKitHub:master May 5, 2026
62 of 63 checks passed

Conversation

ogauthe commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yue-Zhengyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogauthe Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Yue-Zhengyuan commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lkdvos left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yue-Zhengyuan May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Yue-Zhengyuan commented May 1, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ogauthe commented Apr 16, 2026 •

edited

Loading

codecov Bot commented Apr 16, 2026 •

edited

Loading

ogauthe Apr 28, 2026 •

edited

Loading

Yue-Zhengyuan commented Apr 19, 2026 •

edited

Loading

Yue-Zhengyuan May 2, 2026 •

edited

Loading