1 Introduction

An additive set is a nonempty finite subset of an abelian group. The energy of an additive set A is defined to be the number E(A) of quadruples \((a_1, a_2, a_3, a_4)\in A^4\) solving the equation \(a_1+a_2=a_3+a_4\). An easy counting argument shows

$$\begin{aligned} E(A)=\sum _{d\in A-A}r_{A-A}(d)^2\,, \end{aligned}$$
(1.1)

where \(r_{A-A}(d)\) indicates the number of representations of d as a difference of two members of A. So the Cauchy–Schwarz inequality yields \(E(A)\ge |A|^4/|A-A|\) and, in particular, every additive set A with small difference set \(A-A\) contains a lot of energy. In the converse direction Balog and Szemerédi [2] proved that large energy implies the existence of a substantial subset whose difference set is small. After several quantitative improvements (see e.g., Gowers [3] and Balog [1]) the hitherto best version of this result was obtained by the second author [4].

Theorem 1.1

Given a real \(K\ge 1\) every additive set A with energy \(E(A)\ge |A|^3/K\) has a subset \(A'\subseteq A\) of size \(|A'|\ge \Omega (|A|/K)\) such that \(|A'-A'|\le O(K^{4}|A'|)\). \(\square \)

When investigating the question how a quantitatively optimal version of this result might read, there are two different directions one may wish to pursue. First, there is the obvious problem whether the exponent 4 can be replaced by some smaller number. Second, one may try to find “the largest” set \(A'\subseteq A\) such that \(|A'-A'|\le O_K(|A'|)\) holds. As the following example demonstrates, there is no absolute constant \(\varepsilon _\star >0\) such that \(|A'|\ge (1+\varepsilon _\star )K^{-1/2}|A|\) can be achieved in general.

Fix an arbitrary natural number n. For a very large finite abelian group G we consider the additive set

$$\begin{aligned} A=\bigl \{(g_1, \ldots , g_n)\in G^n:\text { there is at most one index }i\text { such that }g_i\ne 0\bigr \} \end{aligned}$$

whose ambient group is \(G^n\). Obviously we have

$$\begin{aligned} |A|=|G|n+O_n(1) \quad \text { and } \quad E(A)=|A|^3/n^2+O_n(|A|^2), \end{aligned}$$

so the real number K satisfying \(E(A)=|A|^3/K\) is roughly \(n^2\). However, every \(A'\subseteq A\) of size \(|A'|\ge (1+\varepsilon )|G|\) satisfies \(|A'-A'|\ge \varepsilon ^2 |G|^2\). Our main result implies that this is, in some sense, already the worst example. More precisely, for every fixed \(\varepsilon >0\) the Balog-Szemerédi-Gowers theorem holds with \(|A'|\ge (1-\varepsilon )K^{-1/2}|A|\). Perhaps surprisingly, we can also reproduce the best known factor \(K^4\).

Theorem 1.2

Given real numbers \(K\ge 1\), \(\varepsilon \in (0, 1/2)\), and an additive set A with energy \(E(A)\ge |A|^3/K\) there is a subset \(A'\subseteq A\) such that

$$\begin{aligned} |A'|\ge (1-\varepsilon )K^{-1/2}|A| \quad \text { and } \quad |A'-A'|\le 2^{33}\varepsilon ^{-9}K^{4}|A'|=O_\varepsilon (K^4|A'|). \end{aligned}$$

Our proof has two main cases and in one of them (see Lemma 3.1 below) we even get the stronger bound \(|A'-A'|\le O_\varepsilon (K^3|A'|)\). It would be interesting to prove this in the second case as well. Using examples of the form \(A=\{x\in {\mathbb {Z}}^d:\Vert x\Vert \le R\}\) one can show that the exponent 4 cannot be replaced by any number smaller than \(\log (4)/\log (27/16)\approx 2.649\) (see [5]).

2 Preliminaries

This section discusses two auxiliary results we shall require for the proof of Theorem 1.2. The first of them is similar to [6, Lemma 6.19].

Lemma 2.1

If \(\delta , \xi \in (0, 1]\) and \(R\subseteq A^2\) denotes a binary relation on a set A such that \(|R|\ge \delta |A|^2\), then there is a set \(A'\subseteq A\) of size \(|A'|\ge \delta (1-\xi ) |A|\) which possesses the following property: For every pair \((a_1, a_2)\in A'^2\) there are at least \(2^{-7}\delta ^4\xi ^4|A|^2|A'|\) triples \((x, b, y)\in A^3\) such that \((a_1, x), (b, x), (b, y), (a_2, y)\in R\).

Proof

Set \(N(x)=\{a\in A:(a, x)\in R\}\) for every \(x\in A\). Since \(\sum _{x\in A}|N(x)|=|R|\ge \delta |A|^2\), the Cauchy–Schwarz inequality yields

$$\begin{aligned} \sum _{x\in A}|N(x)|^2\ge \delta ^2|A|^3\,. \end{aligned}$$
(2.1)

Setting \(K(a, a')=\{x\in A:a, a'\in N(x)\}\) for every pair \((a, a')\in A^2\) and

$$\begin{aligned} \Omega = \bigl \{(a, a')\in A^2:|K(a, a')|\le \delta ^2\xi ^2|A|/8\bigr \} \end{aligned}$$

a double counting argument yields

$$\begin{aligned} \sum _{x\in A}|N(x)^2\cap \Omega | = \sum _{(a, a')\in \Omega }|K(a, a')| \le \delta ^2\xi ^2|A||\Omega |/8 \le \delta ^2\xi ^2|A|^3/8. \end{aligned}$$

Together with (2.1) we obtain

$$\begin{aligned} \sum _{x\in A} \bigl (|N(x)|^2-8\xi ^{-1}|N(x)^2\cap \Omega |\bigr ) \ge \delta ^2(1-\xi )|A|^3 \end{aligned}$$

and, hence, there exists some \(x_\star \in A\) such that the set \(A_\star =N(x_\star )\) satisfies

$$\begin{aligned} |A_\star |^2-8\xi ^{-1}|A_\star ^2\cap \Omega |\ge \delta ^2(1-\xi )|A|^2\,. \end{aligned}$$
(2.2)

We shall prove that the set

$$\begin{aligned} A'=\{a\in A_\star :\text { the number of all }a'\in A_\star \text { with }(a, a')\in \Omega \text { is at most }|A_\star |/4\} \end{aligned}$$

has all required properties. By (2.2) we have

$$\begin{aligned} |A_\star \smallsetminus A'||A_\star |/4 \le |A_\star ^2\cap \Omega | \le \xi |A_\star |^2/8, \end{aligned}$$

for which reason

$$\begin{aligned} |A'| \ge (1-\xi /2)|A_\star | \ge (1-\xi )^{1/2}|A_\star | \overset{(2.2)}{\ge } \delta (1-\xi ) |A|, \end{aligned}$$

meaning that \(A'\) is indeed sufficiently large. To conclude the proof we need to show

$$\begin{aligned} \sum _{b\in A}|K(a_1, b)\times K(b, a_2)| \ge 2^{-7}\delta ^4\xi ^4|A|^2|A'| \end{aligned}$$

for every pair \((a_1, a_2)\in A'^2\). This follows from the fact that due to the definition of \(A'\) there are at least \(|A_\star |/2\) elements \(b\in A_\star \) such that the sets \(K(a_1, b)\) and \(K(b, a_2)\) both have at least the size \(\delta ^2\xi ^2|A|/8\). \(\square \)

Lemma 2.2

Suppose that the real numbers \(x_1, \dots , x_n\in [0, 1]\) do not vanish simultaneously. Denote their sum by S and the sum of their squares by T. For every \(\alpha \in (0, 1)\) there exists a set \(I\subseteq [n]\) such that

$$\begin{aligned} \sum _{i\in I}x_i \ge \max \left\{ \alpha T, \biggl (\frac{(1-\alpha )^5|I|^4T^4}{2^{10}S^2}\biggr )^{1/6}\right\} . \end{aligned}$$

Proof

For reasons of symmetry we may assume \(x_1\ge \dots \ge x_n\). Set \(S_i=\sum _{j=1}^i x_j\) for every nonnegative \(i\le n\). Due to \(T\le x_1 S\) and \(x_1\le 1\) we have \(T\le S=S_n\) and thus there exists a smallest index \(k\in [n]\) satisfying \(S_k\ge \alpha T\). Notice that

$$\begin{aligned} \sum _{i=1}^{k-1}x_i^2 \le \sum _{i=1}^{k-1}x_i =S_{k-1} \le \alpha T. \end{aligned}$$

Moreover \(x_1\ge T/S\) implies the existence of a largest index \(\ell \) such that \(x_\ell \ge (1-\alpha )T/(2\,S)\). Due to

$$\begin{aligned} \sum _{i=\ell +1}^n x_i^2 \le \frac{(1-\alpha )T}{2S}\sum _{i=\ell +1}^n x_i \le \frac{(1-\alpha )T}{2}\,, \end{aligned}$$

we have

$$\begin{aligned} \sum _{i=k}^{\ell } x_i^2 \ge \frac{(1-\alpha )T}{2}\,, \end{aligned}$$
(2.3)

whence, in particular, \(\ell \ge k\). Next,

$$\begin{aligned} \ell \biggl (\frac{(1-\alpha )T}{2S}\biggr )^2 \le \sum _{i=1}^\ell x_i^2 \le T \end{aligned}$$

entails

$$\begin{aligned} (1-\alpha )^2\ell T\le 4 S^2\,. \end{aligned}$$
(2.4)

Now assume for the sake of contradiction that our claim fails. Every \(i\in [k, \ell ]\) satisfies \(S_i\ge S_k\ge \alpha T\) and thus the failure of \(I=[i]\) discloses

$$\begin{aligned} S_i<\biggl (\frac{(1-\alpha )^5i^4T^4}{2^{10}S^2}\biggr )^{1/6}\,. \end{aligned}$$

Combined with \(ix_i\le S_i\) this entails

$$\begin{aligned} \sum _{i=k}^\ell x_i^2 \le \biggl (\frac{(1-\alpha )^5T^4}{2^{10}S^2}\biggr )^{1/3}\sum _{i=k}^\ell i^{-2/3}. \end{aligned}$$

In view of (2.3) we are thus led to

$$\begin{aligned} \biggl (\frac{2^7S^2}{(1-\alpha )^2T}\biggr )^{1/3} \le \sum _{i=k}^\ell i^{-2/3} \le \int _0^{\ell }x^{-2/3} \textrm{d}x = 3\ell ^{1/3}, \end{aligned}$$

i.e., \(2^7S^2\le 27(1-\alpha )^2\ell T\), which contradicts (2.4). \(\square \)

3 The proof of Theorem 1.2

Let us fix two real numbers \(K\ge 1\) and \(\varepsilon \in (0, 1/2)\) as well as an additive set A satisfying \(E(A)\ge |A|^3/K\). We consider the partition

defined by

$$\begin{aligned} P&=\bigl \{d\in A-A:r_{A-A}(d)\ge K^{-1/2}|A|\bigr \} \\ \text { and } \quad Q&=\bigl \{d\in A-A:r_{A-A}(d) < K^{-1/2}|A|\bigr \} \,. \end{aligned}$$

According to (1.1) at least one of the cases

$$\begin{aligned} \sum _{d\in P}r_{A-A}(d)^2 \ge \frac{\varepsilon |A|^3}{4K} \quad \text { or } \quad \sum _{d\in Q}r_{A-A}(d)^2 \ge \frac{(4-\varepsilon )|A|^3}{4K} \end{aligned}$$
(3.1)

needs to occur and we begin by analysing the left alternative.

Lemma 3.1

If \(\sum _{d\in P}r_{A-A}(d)^2\ge \varepsilon |A|^3/(4K)\), then there exists a set \(A'\subseteq A\) of size \(|A'|\ge (1-\varepsilon )K^{-1/2}|A|\) such that \(|A'-A'|\le 2^{10}\varepsilon ^{-4} K^3|A'|\).

Proof

For every difference \(d\in P\) we set \(A_d=A\cap (A+d)\). Due to \(|A_d|=r_{A-A}(d)\) the hypothesis implies

$$\begin{aligned} \sum _{d\in P}|A_d|^2\ge \varepsilon |A|^3/(4K)\,. \end{aligned}$$
(3.2)

For every pair \((x, y)\in A^2\) the set \(L(x, y)=\{d\in P:x, y\in A_d\}\) has at most the cardinality \(|L(x, y)|\le r_{A-A}(x-y)\), because every difference \(d\in L(x, y)\) corresponds to its own representation \(x-y=(x-d)-(y-d)\) of \(x-y\) as a difference of two members of A. Applying this observation to all pairs in

$$\begin{aligned} \Xi =\bigl \{(x, y)\in A^2:r_{A-A}(x-y)\le \varepsilon ^2|A|/(16K)\bigr \} \end{aligned}$$

we obtain

$$\begin{aligned} \sum _{d\in P}|A_d^2\cap \Xi | = \sum _{(x, y)\in \Xi }|L(x, y)| \le \sum _{(x, y)\in \Xi }r_{A-A}(x-y) \le \frac{\varepsilon ^2|A||\Xi |}{16K} \le \frac{\varepsilon ^2|A|^3}{16K}. \end{aligned}$$

Together with (3.2) this yields

$$\begin{aligned} \sum _{d\in P}\bigl (\varepsilon |A_d^2|-4|A_d^2\cap \Xi |\bigr )\ge 0 \end{aligned}$$

and, consequently, for some element \(d(\star )\in P\) the set \(A_\star =A_{d(\star )}\) satisfies \(|A_\star ^2\cap \Xi |\le \varepsilon |A_\star |^2/4\). We contend that the set

$$\begin{aligned} A'=\bigl \{a\in A_\star :\text { There are at most }|A_\star |/4\text { pairs of the form }(a, x)\text { in }\Xi \bigr \} \end{aligned}$$

has the required properties. As in the proof of Lemma 2.1 we obtain

$$\begin{aligned} |A'|\ge (1-\varepsilon )|A_\star | = (1-\varepsilon )r_{A-A}(d(\star ))\ge (1-\varepsilon )K^{-1/2}|A|; \end{aligned}$$

so it remains to derive the required upper bound on \(|A'-A'|\).

To this end we consider an arbitrary pair \((a, a')\) of elements of \(A'\). Owing to the definition of \(A'\) there are at least \(|A_\star |/2\) elements \(x\in A_\star \) such that \((a, x)\not \in \Xi \) and \((a', x)\not \in \Xi \). For each of them we have \(a-a'=(a-x)-(a'-x)\), there are at least \(\varepsilon ^2|A|/(16K)\) pairs \((a_1, a_2)\in A^2\) solving the equation \(a-x=a_1-a_2\) and at least the same number of pairs \((a_3, a_4)\in A^2\) such that \(a'-x=a_3-a_4\). Altogether there are at least

$$\begin{aligned} \varepsilon ^4|A|^2|A_\star |/(2^{9}K^2) \ge 2^{-9}\varepsilon ^4K^{-5/2}|A|^3 \end{aligned}$$

possibilities of writing \(a-a'=(a_1-a_2)-(a_3-a_4)\) and for this reason we have

$$\begin{aligned} |A'-A'| \le \frac{|A|^4}{2^{-9}\varepsilon ^4K^{-5/2}|A|^3} = 2^9\varepsilon ^{-4}K^{5/2}|A| \le 2^{10}\varepsilon ^{-4}K^3 |A'|. \end{aligned}$$

\(\square \)

We conclude the proof of Theorem 1.2 by taking care of the right case in (3.1).

Lemma 3.2

If \(\sum _{d\in Q}r_{A-A}(d)^2\ge (1-\varepsilon /4)|A|^3/K\), then there is a set \(A'\subseteq A\) of size \(|A'|\ge (1-\varepsilon )K^{-1/2}|A|\) such that \(|A'-A'|\le 2^{33}\varepsilon ^{-9}K^{4}|A'|\).

Proof

Let \(Q=\{d_1, \ldots , d_{|Q|}\}\) enumerate Q. By the definition of Q there are real numbers \(x_1, \ldots , x_{|Q|}\in [0, 1]\) such that

$$\begin{aligned} r_{A-A}(d_i) = x_iK^{-1/2}|A| \quad \text { holds for every } i\in [|Q|]. \end{aligned}$$

Owing to \(\sum _{d\in A-A} r_{A-A}(d)=|A|^2\) and the hypothesis we have

$$\begin{aligned} \sum _{i=1}^{|Q|}x_i \le K^{1/2} |A| \quad \text { as well as } \quad \sum _{i=1}^{|Q|}x_i^2 \ge (1-\varepsilon /4)|A|. \end{aligned}$$

By Lemma 2.2 applied with \(\alpha =1-\varepsilon /4\) there exist an index set \(I\subseteq [|Q|]\) such that

$$\begin{aligned} \sum _{i\in I}x_i \ge \max \left\{ (1-\varepsilon /2)|A|, \bigl (2^{-21}\varepsilon ^5K^{-1}|I|^4|A|^2\bigr )^{1/6}\right\} \,. \end{aligned}$$
(3.3)

Now we set \(Q'=\{d_i:i\in I\}\), consider the relation

$$\begin{aligned} R=\{(a_1, a_2)\in A^2:a_1-a_2\in Q'\} \end{aligned}$$

and define \(\delta \in (0, 1]\) by \(|R|=\delta |A|^2\). Due to

$$\begin{aligned} \delta = |A|^{-2}\sum _{i\in I}r_{A-A}(d_i) = \frac{1}{K^{1/2}|A|}\sum _{i\in I} x_i \end{aligned}$$

the bounds in (3.3) imply both

$$\begin{aligned} \delta \ge (1-\varepsilon /2)K^{-1/2} \quad \text { and } \quad \frac{|I|^4}{\delta ^6|A|^4} \le 2^{21}\varepsilon ^{-5}K^4\,. \end{aligned}$$
(3.4)

By Lemma 2.1 applied to \(\xi =\varepsilon /2\) and R there exists a set \(A'\subseteq A\) of size

$$\begin{aligned} |A'|\ge (1-\varepsilon /2)\delta |A|\ge (1-\varepsilon )K^{-1/2}|A| \end{aligned}$$

such that for every pair \((a_1, a_2)\in A'^2\) there are at least \(2^{-11}\varepsilon ^4\delta ^4|A|^2|A'|\) triples \((x, b, y)\in A^3\) with \((a_1, x), (b, x), (b, y), (a_2, y)\in R\). Due to the equation

$$\begin{aligned} (a_1-a_2)=(a_1-x)-(b-x)+(b-y)-(a_2-y) \end{aligned}$$

this means that every difference \(a_1-a_2\in A'-A'\) has at least \(2^{-11}\varepsilon ^4\delta ^4|A|^2|A'|\) representations of the form \(q_1-q_2+q_3-q_4\) with \(q_1, q_2, q_3, q_4\in Q'\), whence

$$\begin{aligned} |A'-A'| \le \frac{|Q'|^4}{2^{-11}\varepsilon ^4\delta ^4|A|^2|A'|} \overset{(3.4)}{\le } 2^{32}\varepsilon ^{-9}K^4(\delta |A|/|A'|)^2|A'|. \end{aligned}$$

Due to \(|A'|\ge (1-\varepsilon /2)\delta |A|\ge \delta |A|/\sqrt{2}\) the result follows. \(\square \)