• #### First extragalactic measurement of the turbulence driving parameter: ALMA observations of the star-forming region N159E in the Large Magellanic Cloudver. 2

Studying the driving modes of turbulence is important for characterizing the impact of turbulence in various astrophysical environments. The driving mode of turbulence is parameterized by $b$, which relates the width of the gas density PDF to the turbulent Mach number; $b\approx 1/3$, $1$, and $0.4$ correspond to driving that is solenoidal, compressive, and a natural mixture of the two, respectively. In this work, we use high-resolution (sub-pc) ALMA $^{12}$CO ($J$ = $2-1$), $^{13}$CO ($J$ = $2-1$), and C$^{18}$O ($J$ = $2-1$) observations of filamentary molecular clouds in the star-forming region N159E (the Papillon Nebula) in the Large Magellanic Cloud (LMC) to provide the first measurement of turbulence driving parameter in an extragalactic region. We use a non-local thermodynamic equilibrium (NLTE) analysis of the CO isotopologues to construct a gas density PDF, which we find to be largely log-normal in shape with some intermittent features indicating deviations from lognormality. We find that the width of the log-normal part of the density PDF is comparable to the supersonic turbulent Mach number, resulting in $b \approx 0.9$. This implies that the driving mode of turbulence in N159E is primarily compressive. We speculate that the compressive turbulence could have been powered by gravo-turbulent fragmentation of the molecular gas, or due to compression powered by H I flows that led to the development of the molecular filaments observed by ALMA in the region. Our analysis can be easily applied to study the nature of turbulence driving in resolved star-forming regions in the local as well as the high-redshift Universe.
TurbulenceAtacama Large Millimeter ArrayNon Local Thermodynamic EquilibriumLarge Magellanic CloudLocal thermal equilibriumMach numberStar-forming regionMolecular cloudVelocity dispersionOptically thick medium...
• #### Chirality in Astrophysics

Chirality, or handedness, enters astrophysics in three distinct ways. Magnetic field and vortex lines tend to be helical and have a systematic twist in the northern and southern hemispheres of a star or a galaxy. Helicity is here driven by external factors. Chirality can also enter at the microphysical level and can then be traced back to the parity-breaking weak force. Finally, chirality can arise spontaneously, but this requires not only the presence of an instability, but also the action of nonlinearity. Examples can be found both in magnetohydrodynamics and in astrobiology, where homochirality among biomolecules probably got established at the origin of life. In this review, all three types of chirality production will be explored and compared.
ChiralityHelicityMagnetic helicityXenobiologyMagnetohydrodynamicsChiral magnetic effectInstabilityParity violationStarWeak interaction...
• #### Activation Landscapes as a Topological Summary of Neural Network Performance

We use topological data analysis (TDA) to study how data transforms as it passes through successive layers of a deep neural network (DNN). We compute the persistent homology of the activation data for each layer of the network and summarize this information using persistence landscapes. The resulting feature map provides both an informative visual- ization of the network and a kernel for statistical analysis and machine learning. We observe that the topological complexity often increases with training and that the topological complexity does not decrease with each layer.
Deep Neural NetworksPersistent homologyMulti-layer PerceptronTraining setArchitectureMachine learningNeural networkSynthetic DataVector spaceFully connected layer...
• #### Dark Energy Survey Year 3 results: Cosmology with peaks using an emulator approach

We constrain the matter density $\Omega_{\mathrm{m}}$ and the amplitude of density fluctuations $\sigma_8$ within the $\Lambda$CDM cosmological model with shear peak statistics and angular convergence power spectra using mass maps constructed from the first three years of data of the Dark Energy Survey (DES Y3). We use tomographic shear peak statistics, including cross-peaks: peak counts calculated on maps created by taking a harmonic space product of the convergence of two tomographic redshift bins. Our analysis follows a forward-modelling scheme to create a likelihood of these statistics using N-body simulations, using a Gaussian process emulator. We include the following lensing systematics: multiplicative shear bias, photometric redshift uncertainty, and galaxy intrinsic alignment. Stringent scale cuts are applied to avoid biases from unmodelled baryonic physics. We find that the additional non-Gaussian information leads to a tightening of the constraints on the structure growth parameter yielding $S_8~\equiv~\sigma_8\sqrt{\Omega_{\mathrm{m}}/0.3}~=~0.797_{-0.013}^{+0.015}$ (68\% confidence limits), with a precision of 1.8\%, an improvement of ~38\% compared to the angular power spectra only case. The results obtained with the angular power spectra and peak counts are found to be in agreement with each other and no significant difference in $S_8$ is recorded. We find a mild tension of $1.5 \thinspace \sigma$ between our study and the results from Planck 2018, with our analysis yielding a lower $S_8$. Furthermore, we observe that the combination of angular power spectra and tomographic peak counts breaks the degeneracy between galaxy intrinsic alignment $A_{\mathrm{IA}}$ and $S_8$, improving cosmological constraints. We run a suite of tests concluding that our results are robust and consistent with the results from other studies using DES Y3 data.
Dark Energy SurveyShearedCosmologyStatisticsGalaxyIntrinsic alignmentCosmic shearMilky WayCosmological constraintsCosmological parameters...
• #### The eROSITA Final Equatorial-Depth Survey (eFEDS): X-ray Properties and Scaling Relations of Galaxy Clusters and Groups

We investigate the scaling relations between X-ray observables of the clusters detected in the eFEDS field using Spectrum-Roentgen-Gamma/eROSITA observations taking into account the selection effects and the distributions of observables with cosmic time. We extract X-ray observables (Lx, Lbol, T, Mgas, Yx) within R500 for the sample of 542 clusters in the eFEDS field. By applying detection and extent likelihoods, we construct a subsample of 265 clusters with a contamination level of <10% (including AGNs and spurious fluctuations) to be utilized in the scaling relation analysis. The selection function based on the state-of-the-art simulations of the eROSITA sky is fully accounted for in our work. We provide the X-ray observables in the core-included <R500 and core-excised 0.15*R500-R500 apertures for 542 galaxy clusters and groups detected in the eFEDS field. Additionally, we present our best-fit results for the normalization, slope, redshift evolution and intrinsic scatter parameters of the X-ray scaling relations between Lx-T, Lx-Mgas, Lx-Yx, Lbol-T, Lbol-Mgas, Lbol-Yx and Mgas-T. We find that the best-fit slopes significantly deviate from the self-similar model at a >3sigma confidence level however, our results are in good agreement with the simulations including non-gravitational physics and the recent results that take into account selection effects. Strong deviations we find from the self-similar scenario indicate that the non-gravitational effects play an important role in shaping the observed physical state of clusters. This work extends the scaling relations to low mass, low luminosity galaxy cluster and group regime using eFEDS observations, demonstrating eROSITA's ability to measure ICM emission out to R500 with survey-depth exposures and constrain the scaling relations in a wide mass-luminosity-redshift range.
Scaling lawLuminositySelection functionCluster of galaxiesIntrinsic scatterCosmologyWeak lensing mass estimateCalibrationCluster samplingVirial cluster mass...
• #### Neutrino-electron magnetohydrodynamics in an expanding Universever. 2

We derive a new model for neutrino-plasma interactions in an expanding universe that incorporates the collective effects of the neutrinos on the plasma constituents. We start from the kinetic description of a multi-species plasma in the flat Friedmann-Robertson-Walker metric, where the particles are coupled to neutrinos through the charged- and neutral-current forms of the weak interaction. We then derive the fluid equations and specialize our model to (a) the lepton epoch, where we consider a pair electron-positron plasma interacting with electron (anti-)neutrinos, and (b) after the electron-positron annihilation, where we model an electron-proton plasma and take the limit of slow ions and inertia-less electrons to obtain a set of neutrino-electron magnetohydrodynamics (NEMHD) equations. In both models, the dynamics of the plasma is affected by the neutrino motion through a ponderomotive force and, as a result, new terms appear in the induction equation that can act as a source for magnetic field generation in the early universe. A brief discussion on the possible applications of our model is proposed.
NeutrinoPositronMagnetohydrodynamicsFriedmann-Lemaitre-Robertson-Walker metricFaraday's law of inductionExpanding universeElectron-positron plasmaWeak interactionAntineutrinoThe early Universe...
• #### Cosmological Vlasov-Poisson equations for dark matter: Recent developments and connections to selected plasma problems

The cosmic large-scale structures of the Universe are mainly the result of the gravitational instability of initially small density fluctuations in the dark-matter distribution. Dark matter appears to be initially cold and behaves as a continuous and collisionless medium on cosmological scales, with evolution governed by the gravitational Vlasov--Poisson equations. Cold dark matter can accumulate very efficiently at focused locations, leading to a highly non-linear filamentary network with extreme matter densities. Traditionally, investigating the non-linear Vlasov--Poisson equations was typically reserved for massively parallelised numerical simulations. Recently, theoretical progress has allowed us to analyse the mathematical structure of the first infinite densities in the dark-matter distribution by elementary means. We review related advances, as well as provide intriguing connections to classical plasma problems, such as the beam-plasma instability.
Shell crossingDark matterVlasov-Poisson equationPhase spaceNumerical simulationDark Matter Density ProfileStandard perturbation theoryCold dark matterZeldovich approximationCosmology...
• #### Brief on Dark Matter in the Type Ib Seesaw Model: a GeV-scale Dirac neutrino portal

The type Ib seesaw, as an alternative explanation to the origin of neutrino mass, provides a new intriguing way to connect the neutrino physics to cosmology. In this proceeding, we consider a minimal type Ib seesaw model where the effective neutrino mass operator involves two different Higgs doublets and a heavy Dirac mass. We propose a minimal dark matter extension of this model, in which the Dirac heavy neutrino is coupled to a dark Dirac fermion and a dark complex scalar field, both odd under a discrete $Z_2$ symmetry, where the lighter one serves as a dark matter candidate. Focussing on the fermionic dark matter case, we explore the parameter space of the seesaw Yukawa couplings, the neutrino portal couplings and dark scalar to dark fermion mass ratio, where correct dark matter relic abundance can be produced by the freeze-in mechanism. By considering the mixing between the standard model neutrinos and the heavy neutrino, a connection can be built between dark matter production and laboratory experiments.
Dark matterSeesaw mechanismNeutrinoSterile neutrinoDirac neutrinoHiggs doubletNeutrino massYukawa couplingDark Higgs bosonDark fermions...
• #### Dark matter produced from neutrinos

In the presence of interactions between neutrinos and dark matter (DM), DM can potentially be produced via freeze-in from the neutrino sector. We investigate the implications of such a scenario for the evolution of both DM and neutrinos in the early Universe, and show that the future cosmic neutrino detection experiment PTOLEMY might be sensitive to neutrino signals that originate from DM annihilation in this model.
NeutrinoDark matterDark matter particleDark matter annihilationFermi-Dirac statisticsBoltzmann transport equationFreeze-inRelic abundanceGalactic CenterMilky Way...
• #### One-loop Gluon Amplitudes in AdS

We initiate the study of one-loop gluon amplitudes in AdS space. These amplitudes were recently computed at tree level for a variety of backgrounds of the form $AdS_{d+1} \times S^3$. For concreteness, we compute the one-loop correction to the massless gluon amplitude on $AdS_5\times S^3$, which corresponds to the four-point correlator of the flavor current multiplet in the dual 4d $\mathcal{N}=2$ SCFT. This requires solving a mixing problem that involves tree-level amplitudes of arbitrarily massive Kaluza-Klein modes. The final answer has the same color structure as in flat space but the dependence on Mandelstam variables is more complicated, with logarithms replaced by polygamma functions.
Anti de Sitter spaceOne-loop amplitudeSupergravityGravitonUnitarityBox integralsScaling dimensionCentral chargeBinary population synthesisAnomalous dimension...
• #### Minimal Cycle Representatives in Persistent Homology using Linear Programming: an Empirical Study with User's Guidever. 3

Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of several $\ell_1$-minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: (i) optimization is effective in reducing the size of cycle representatives, (ii) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis in most data sets we consider, (iii) the choice of linear solvers matters a lot to the computation time of optimizing cycles, (iv) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, (v) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in {-1, 0, 1} and therefore, are also solutions to a restricted $\ell_0$ optimization problem, and (vi) we obtain qualitatively different results for generators in Erd\H{o}s-R\'enyi random clique complexes.
Persistent homologyLinear optimizationOptimizationProgrammingAlgorithmsMatricesDimensions...
• #### UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answeringver. 2

We study open-domain question answering with structured, unstructured and semi-structured knowledge sources, including text, tables, lists and knowledge bases. Departing from prior work, we propose a unifying approach that homogenizes all sources by reducing them to text and applies the retriever-reader model which has so far been limited to text sources only. Our approach greatly improves the results on knowledge-base QA tasks by 11 points, compared to latest graph-based methods. More importantly, we demonstrate that our unified knowledge (UniK-QA) model is a simple and yet effective way to combine heterogeneous sources of knowledge, advancing the state-of-the-art results on two popular question answering benchmarks, NaturalQuestions and WebQuestions, by 3.5 and 2.6 points, respectively.
Knowledge baseGraphArchitectureKnowledge graphAttentionGoogle.comEngineeringNatural languageApplication programming interfaceInference...
• #### Training with Quantization Noise for Extreme Model Compressionver. 3

We tackle the problem of producing compact models, maximizing their accuracy for a given model size. A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator. In this paper, we extend this approach to work beyond int8 fixed-point quantization with extreme compression methods where the approximations introduced by STE are severe, such as Product Quantization. Our proposal is to only quantize a different random subset of weights during each forward, allowing for unbiased gradients to flow through the other weights. Controlling the amount of noise and its form allows for extreme compression rates while maintaining the performance of the original model. As a result we establish new state-of-the-art compromises between accuracy and model size both in natural language processing and image classification. For example, applying our method to state-of-the-art Transformer and ConvNet architectures, we can achieve 82.5% accuracy on MNLI by compressing RoBERTa to 14MB and 80.0 top-1 accuracy on ImageNet by compressing an EfficientNet-B3 to 3.3MB.
QuantizationArchitectureInferenceAttentionComputational linguisticsStatistical estimatorArithmeticSentence representationsConvolution Neural NetworkCompact modeling...
• #### Neural Discrete Representation Learningver. 2

Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
EmbeddingLatent variableGenerative modelAutoencoderStatistical estimatorRelaxationUnsupervised learningMachine learningArchitectureInference...
• #### Can a strong radio burst escape the magnetosphere of a magnetar?ver. 2

We examine the possibility that fast radio bursts (FRBs) are emitted inside the magnetosphere of a magnetar. On its way out, the radio wave must interact with a low-density $e^\pm$ plasma in the outer magnetosphere at radii $R=10^9$-$10^{10}\,$cm. In this region, the magnetospheric particles have a huge cross section for scattering the wave. As a result, the wave strongly interacts with the magnetosphere and compresses it, depositing the FRB energy into the compressed field and the scattered radiation. The scattered spectrum extends to the $\gamma$-ray band and triggers $e^\pm$ avalanche, further boosting the opacity. These processes choke FRBs, disfavoring scenarios with a radio source confined at $R\ll 10^{10}\,$cm. Observed FRBs can be emitted by magnetospheric flare ejecta transporting energy to large radii.
• #### The physics of relativistic jets

Highlights in the field of relativistic jets are reviewed and critically analyzed. Given the extent of the available literature, this essay symbolically takes the baton from the outstanding and recent review by Blandford, Meier, and Readhead (2019). Therefore, I focus mostly on the results published during the latest few years, with specific reference to jets from active galactic nuclei. I conclude with some criticism and advice, which can be extended to current science at large.
Astrophysical jetActive Galactic NucleiRelativistic jetBlack holeAccretion diskNarrow-line seyfert 1 galaxyProgrammingDissipationVery Long Baseline ArrayX-ray binary...
• #### Very-High-Energy Emission From Pulsars

Air-Cherenkov telescopes have detected pulsations at energies above 50 GeV from a growing number of Fermi pulsars. These include the Crab, Vela, PSR B1706-44 and Geminga, with the first two having pulsed detections above 1 TeV. In some cases, there appears to be very-high-energy (VHE) emission that is an extension of the Fermi spectra to high energies, while in other cases, additional higher-energy spectral components that require a separate emission mechanism may be present. We present results of broad-band spectral modeling using global magnetosphere fields and multiple emission mechanisms that include synchro-curvature (SC) and inverse Compton scattered (ICS) radiation from accelerated particles (primaries) and synchrotron-self Compton (SSC) emission from lower-energy pairs. Our models predict three distinct VHE components: SC from primaries whose high-energy tail can extend to 100 GeV, SSC from pairs that can extend to several TeV and ICS from primary particles accelerated in the current sheet, scattering pair synchrotron radiation, that appears beyond 10 TeV. Our models suggest that H.E.S.S.-II and MAGIC have detected the high-energy tail of the primary SC component that produces the Fermi spectrum in Vela, Geminga and PSR B1706-44. We argue that the ICS component peaking above 10 TeV from Vela has been seen by H.E.S.S. Detection of this emission component from the Crab and other pulsars is possible with HAWC and CTA, and directly measures the maximum particle energy in pulsars.
Inverse ComptonPulsarLight curveGemingaSynchrotron Self-Compton radiationCurvatureSpectral energy distributionMAGIC telescopeMagnetosphere of a starPhase space caustic...
• #### Escape of Fast Radio Bursts from magnetars' magnetospheres

We discuss dissipative processes occurring during production and escape of Fast Radio Bursts (FRBs) from magnetars' magnetospheres, the presumed loci of FRBs. High magnetic fields are required in the emission region, both to account for the overall energetics of FRBs, and in order to suppress normal'' (non-coherent) radiative losses of radio emitting particles; this limits the emission radii to $\leq {\rm few} \times 10 R_{NS}$. Radiative losses by particles in the strong FRB pulse may occur in the outer regions of the magnetosphere for longer rotation periods, $P\geq 1$ second. These losses are suppressed by several effects: (i) the ponderomotive pre-acceleration of background plasma along the direction of wave propagation (losses reduced approximately as $\gamma_\parallel^{3}$: smaller frequency, $\propto \gamma_\parallel^2$ in power, and times scales stretched, $\propto \gamma_\parallel$); this acceleration is non-dissipative and is reversed on the declining part of the pulse; (ii) Landau-Pomeranchuk-Migdal effects (long radiation formation length and ensuing destructive interference of scattered waves). In some cases an FRB pulse may be dissipated on external perturbations (e.g., an incoming pulse of Alfven waves): this may produce a pulse of UV/soft X-rays, a swan song of an FRB, possibly detectable by Chandra.
Fast Radio BurstsMagnetarMagnetosphere of a starWave propagationSoft X-rayLandau-Pomeranchuk-Migdal effectChandra X-ray ObservatoryInterferenceDissipationIntensity...

• #### Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Despite their recent popularity and well known advantages, dense retrievers still lag behind sparse methods such as BM25 in their ability to reliably match salient phrases and rare entities in the query. It has been argued that this is an inherent limitation of dense models. We disprove this claim by introducing the Salient Phrase Aware Retriever (SPAR), a dense retriever with the lexical matching capacity of a sparse model. In particular, we show that a dense retriever {\Lambda} can be trained to imitate a sparse one, and SPAR is built by augmenting a standard dense retriever with {\Lambda}. When evaluated on five open-domain question answering datasets and the MS MARCO passage retrieval task, SPAR sets a new state of the art for dense and sparse retrievers and can match or exceed the performance of more complicated dense-sparse hybrid systems.
SparsityArchitectureRankTraining setInferenceEmbeddingDistillationBag of words modelNearest neighbor searchTransformer...

Retrieving relevant contexts from a large corpus is a crucial step for tasks such as open-domain question answering and fact checking. Although neural retrieval outperforms traditional methods like tf-idf and BM25, its performance degrades considerably when applied to out-of-domain data. Driven by the question of whether a neural retrieval model can be universal and perform robustly on a wide variety of problems, we propose a multi-task trained model. Our approach not only outperforms previous methods in the few-shot setting, but also rivals specialised neural retrievers, even when in-domain training data is abundant. With the help of our retriever, we improve existing models for downstream tasks and closely match or improve the state of the art on multiple benchmarks.
Training setArchitectureFact checkingComputational linguisticsKeyphraseSparsityInformation retrievalYet Another Great OntologyKnowledge baseNatural language...
• #### Joint Verification and Reranking for Open Fact Checking Over Tablesver. 2

Structured information is an important knowledge source for automatic verification of factual claims. Nevertheless, the majority of existing research into this task has focused on textual data, and the few recent inquiries into structured data have been for the closed-domain setting where appropriate evidence for each claim is assumed to have already been retrieved. In this paper, we investigate verification over structured data in the open-domain setting, introducing a joint reranking-and-verification model which fuses evidence documents in the verification component. Our open-domain model achieves performance comparable to the closed-domain state-of-the-art on the TabFact dataset, and demonstrates performance gains from the inclusion of multiple tables as well as a significant improvement over a heuristic retrieval baseline.
AttentionFact checkingRankRankingEmbeddingEntropyProgrammingNatural language inferenceSynthetic DataGraph Neural Network...
• #### Answering Complex Open-Domain Questions with Multi-Hop Dense Retrievalver. 2

We propose a simple and efficient multi-hop dense retrieval approach for answering complex open-domain questions, which achieves state-of-the-art performance on two multi-hop datasets, HotpotQA and multi-evidence FEVER. Contrary to previous work, our method does not require access to any corpus-specific information, such as inter-document hyperlinks or human-annotated entity markers, and can be applied to any unstructured text corpus. Our system also yields a much better efficiency-accuracy trade-off, matching the best published accuracy on HotpotQA while being 10 times faster at inference time.
GraphInformation retrievalInferenceAttentionText corpusQuery reformulationGenerative modelPassage distributionGround truthKnowledge base...
• #### Dense Passage Retrieval for Open-Domain Question Answeringver. 3

Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks.
• #### Zero-Shot Dense Retrieval with Momentum Adversarial Domain Invariant Representations

Dense retrieval (DR) methods conduct text retrieval by first encoding texts in the embedding space and then matching them by nearest neighbor search. This requires strong locality properties from the representation space, i.e, the close allocations of each small group of relevant texts, which are hard to generalize to domains without sufficient training data. In this paper, we aim to improve the generalization ability of DR models from source training domains with rich supervision signals to target domains without any relevant labels, in the zero-shot setting. To achieve that, we propose Momentum adversarial Domain Invariant Representation learning (MoDIR), which introduces a momentum method in the DR training process to train a domain classifier distinguishing source versus target, and then adversarially updates the DR encoder to learn domain invariant representations. Our experiments show that MoDIR robustly outperforms its baselines on 10+ ranking datasets from the BEIR benchmark in the zero-shot setup, with more than 10% relative gains on datasets with enough sensitivity for DR models' evaluation. Source code of this paper will be released.
ClassifierEmbeddingHyperparameterRankingTraining setSparsityNearest neighbor searchNearest-neighbor siteAttentionVaccine...
• #### Sachdev-Ye-Kitaev Models and Beyond: A Window into Non-Fermi Liquidsver. 2

We present a review of the Sachdev-Ye-Kitaev (SYK) model of compressible quantum many-body systems without quasiparticle excitations, and its connections to various theoretical studies of non-Fermi liquids in condensed matter physics. The review is placed in the context of numerous experimental observations on correlated electron materials. Strong correlations in metals are often associated with their proximity to a Mott transition to an insulator created by the local Coulomb repulsion between the electrons. We explore the phase diagrams of a number of models of such local electronic correlation, employing a dynamical mean field theory in the presence of random spin exchange interactions. Numerical analyses and analytical solutions, using renormalization group methods and expansions in large spin degeneracy, lead to critical regions which display SYK physics. The models studied include the single-band Hubbard model, the $t$-$J$ model and the two-band Kondo-Heisenberg model in the presence of random spin exchange interactions. We also examine non-Fermi liquids obtained by considering each SYK model with random four-fermion interactions to be a multi-orbital atom, with the SYK-atoms arranged in an infinite lattice. We connect to theories of sharp Fermi surfaces without any low-energy quasiparticles in the absence of spatial disorder, obtained by coupling a Fermi liquid to a gapless boson; a systematic large $N$ theory of such a critical Fermi surface, with SYK characteristics, is obtained by averaging over an ensemble of theories with random boson-fermion couplings. Finally, we present an overview of the links between the SYK model and quantum gravity and end with an outlook on open questions.
Sachdev-Ye-Kitaev modelFermi surfaceFermi liquidGreen's functionEntropySelf-energySpin glassDopingNon-Fermi liquidSaddle point...
• #### The Tail of Late-Forming Dwarf Galaxies in $\Lambda$CDM

We use a robust analytical model together with a high-resolution hydrodynamical cosmological simulation to demonstrate that in a $\Lambda$CDM Universe, a small fraction of dwarf galaxies inhabiting dark matter (DM) halos in the mass range $3\times 10^{9} \lesssim M_{200} / M_{\odot} \lesssim 10^{10}$ form unusually late ($z<3$) compared to the bulk population of galaxies. These galaxies originate from the interplay between the stochastic growth of DM halos and the existence of a time-dependent DM halo mass below which galaxies do not form. The formation epoch of the simulated late-forming galaxies traces remarkably well the time when their host DM halos first exceeded a non-trivial (but well-understood) time-dependent critical mass, thus making late-forming dwarfs attractive cosmological probes with constraining power over the past growth history of their host halos. The agreement between our model and the simulation results demonstrates that the population of simulated late-forming dwarfs is a robust cosmological outcome and largely independent of the specific galaxy formation model included in the simulations provided: 1) the Universe underwent cosmic reionization before $z_{\rm re} \sim 8$; 2) star formation proceeds in gas that self-gravitates; and 3) galaxy formation is largely restricted to atomic cooling halos before $z_{\rm re}$. The scarcity of massive late-forming dwarfs expected in $\Lambda$CDM implies that the great majority of bright, metal-poor, and actively star-forming dwarfs observed in our local Universe -- the most obvious candidates for these late-forming galaxies -- cannot be undergoing their formation for the first time at the present day in a $\Lambda$CDM Universe.
GalaxyGalaxy FormationDark matter haloVirial massStar formationDwarf galaxyDark matterMilky WayAtomic line coolingBlue compact dwarf...
• #### Semiclassical Boltzmann magnetotransport theory in anisotropic systems with a nonvanishing Berry curvature

Understanding the transport behavior of an electronic system under the influence of a magnetic field remains a key subject in condensed matter physics. Particularly in topological materials, their nonvanishing Berry curvatures can lead to many interesting phenomena in magnetotransport owing to the coupling between the magnetic field and the Berry curvature. By fully incorporating both the field-driven anisotropy and the inherent anisotropy in the band dispersion, we study semiclassical Boltzmann magnetotransport theory in topological materials with nonvanishing Berry curvatures. We show that the relaxation time is given by the integral equation, including the modified velocity arising from the coupling between the magnetic field and the Berry curvature.
Berry phaseAnisotropyRelaxation timeBoltzmann transport equationCollision integralCondensed matter physicsMagnetoconductivityMobilityWeyl semimetalTopological insulator...
• #### Connections between the Open-boundary Spectrum and Generalized Brillouin Zone in Non-Hermitian Systems

Periodic-boundary spectrum, open-boundary spectrum, as well as the generalized Brillouin zone (GBZ) are three essential properties of a one-dimensional non-Hermitian system. In this paper we illustrate that the deep connections between them can be revealed by a series of special similar transformations. This viewpoint closely connects the topological geometry of the open-boundary spectrum with the GBZ and provides a new efficient numerical method of calculating them accurately. We further extend these connections to non-Hermitian systems in the symplectic symmetry class. We show that if just the open-boundary features of a non-Hermitian system such as the spectrum and the GBZ, are concerned, the relevant symmetry we should consider is not that of the original system itself, but that of one which has higher symmetry and is related to the original system by a similarity transformation.
Periodic boundary conditionsBifurcationWinding numberBrillouin zoneHamiltonianNumerical methodsSkin effectOpinionTime-reversal symmetryCharacteristic equation...
• #### Flows on Metric Graphs with General Boundary Conditions

In this note we study the generation of $C_0$-semigroups by first order differential operators on $\mathrm{L}^p (\mathbb{R}_+,\mathbb{C}^{\ell})\times \mathrm{L}^p ([0,1],\mathbb{C}^{m})$ with general boundary conditions. In many cases we are able to characterize the generation property in terms of the invertibility of a matrix associated to the boundary conditions. The abstract results are used to study well-posedness of transport equations on non-compact metric graphs.
GraphTransport equationRankHamiltonianCauchy problemAttentionTotal-Variation regularizationVector measurePermutationVelocity function...
• #### Kepler motion on single-sheet hyperboloid

The classical Kepler-Coulomb problem on the single-sheeted hyperboloid $H^{3}_1$ is solved in the framework of the Hamilton--Jacobi equation. We have proven that all the bounded orbits are closed and periodic. The paths are ellipses or circles for finite motion.
Constant curvatureHamilton-Jacobi equationMajor axisHamiltonianPseudosphereDe Sitter spaceQuantum mechanicsCoherent stateOrdinary differential equationsElliptical orbit...
• #### MLQA: Evaluating Cross-lingual Extractive Question Answeringver. 3

Question answering (QA) models have shown rapid progress enabled by the availability of large, high-quality benchmark datasets. Such annotated datasets are difficult and costly to collect, and rarely exist in languages other than English, making training QA systems in other languages challenging. An alternative to building large monolingual training datasets is to develop cross-lingual systems which can transfer to a target language without requiring training data in that language. In order to develop such systems, it is crucial to invest in high quality multilingual evaluation benchmarks to measure progress. We present MLQA, a multi-way aligned extractive QA evaluation benchmark intended to spur research in this area. MLQA contains QA instances in 7 languages, namely English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. It consists of over 12K QA instances in English and 5K in each other language, with each QA instance being parallel between 4 languages on average. MLQA is built using a novel alignment context strategy on Wikipedia articles, and serves as a cross-lingual extension to existing extractive QA datasets. We evaluate current state-of-the-art cross-lingual representations on MLQA, and also provide machine-translation-based baselines. In all cases, transfer results are shown to be significantly behind training-language performance.
Training setF1 scoreMachine translationStatisticsAttentionGrammarEngineeringNatural language inferenceRadial distribution functionsModel selection...
• #### Stable, scalable, decentralized P2P file sharing with non-altruistic peers

P2P systems provide a scalable solution for distributing large files in a network. The file is split into many chunks, and peers contact other peers to collect missing chunks to eventually complete the entire file. The so-called `rare chunk' phenomenon, where a single chunk becomes rare and prevents peers from completing the file, is a threat to the stability of such systems. Practical systems such as BitTorrent overcome this issue by requiring a global search for the rare chunk, which necessitates a centralized mechanism. We demonstrate a new system based on an approximate rare-chunk rule, allowing for completely distributed file sharing while retaining scalability and stability. We assume non-altruistic peers and the seed is required to make only a minimal contribution.
P2pLyapunov functionMarkov processCoherent neutrino scatteringPeer-to-peer networkExtinctionInstabilityPoisson processIntensityCross-correlation function...
• #### Large-Scale Study of Curiosity-Driven Learning

Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to the agent. However, annotating each environment with hand-designed, dense rewards is not scalable, motivating the need for developing reward functions that are intrinsic to the agent. Curiosity is a type of intrinsic reward function which uses prediction error as reward signal. In this paper: (a) We perform the first large-scale study of purely curiosity-driven learning, i.e. without any extrinsic rewards, across 54 standard benchmark environments, including the Atari game suite. Our results show surprisingly good performance, and a high degree of alignment between the intrinsic curiosity objective and the hand-designed extrinsic rewards of many game environments. (b) We investigate the effect of using different feature spaces for computing prediction error and show that random features are sufficient for many popular RL game benchmarks, but learned features appear to generalize better (e.g. to novel game levels in Super Mario Bros.). (c) We demonstrate limitations of the prediction-based rewards in stochastic setups. Game-play videos and code are at https://pathak22.github.io/large-scale-curiosity/
EmbeddingFeature spaceReinforcement learningOptimizationArchitectureEntropyInferenceStatisticsAutoencoderCompleteness...
• #### Diversity is All You Need: Learning Skills without a Reward Functionver. 6

Intelligent creatures can explore their environments and learn useful skills without supervision. In this paper, we propose DIAYN ('Diversity is All You Need'), a method for learning useful skills without a reward function. Our proposed method learns skills by maximizing an information theoretic objective using a maximum entropy policy. On a variety of simulated robotic tasks, we show that this simple objective results in the unsupervised emergence of diverse skills, such as walking and jumping. In a number of reinforcement learning benchmark environments, our method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. We show how pretrained skills can provide a good parameter initialization for downstream tasks, and can be composed hierarchically to solve complex, sparse reward tasks. Our results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.
EntropyMutual informationRegularizationReinforcement learningRoboticsInformation theoryStationary distributionOptimizationQ-functionUnsupervised learning...
• #### Visual Reinforcement Learning with Imagined Goalsver. 2

For an autonomous agent to fulfill a wide range of user-specified goals at test time, it must be able to learn broadly applicable and general-purpose skill repertoires. Furthermore, to provide the requisite level of generality, these skills must handle raw sensory input such as images. In this paper, we propose an algorithm that acquires such general-purpose skills by combining unsupervised representation learning and reinforcement learning of goal-conditioned policies. Since the particular goals that might be required at test-time are not known in advance, the agent performs a self-supervised "practice" phase where it imagines goals and attempts to achieve them. We learn a visual representation with three distinct purposes: sampling goals for self-supervised practice, providing a structured transformation of raw sensory inputs, and computing a reward signal for goal reaching. We also propose a retroactive goal relabeling scheme to further improve the sample-efficiency of our method. Our off-policy algorithm is efficient enough to learn policies that operate on raw image observations and goals for a real-world robotic system, and substantially outperforms prior techniques.
Reinforcement learningRoboticsAutoencoderQ-functionLatent variableGround truthHyperparameterMean squared errorEuclidean distanceQ-learning...
• #### Improved Residual Vector Quantization for High-dimensional Approximate Nearest Neighbor Search

Quantization methods have been introduced to perform large scale approximate nearest search tasks. Residual Vector Quantization (RVQ) is one of the effective quantization methods. RVQ uses a multi-stage codebook learning scheme to lower the quantization error stage by stage. However, there are two major limitations for RVQ when applied to on high-dimensional approximate nearest neighbor search: 1. The performance gain diminishes quickly with added stages. 2. Encoding a vector with RVQ is actually NP-hard. In this paper, we propose an improved residual vector quantization (IRVQ) method, our IRVQ learns codebook with a hybrid method of subspace clustering and warm-started k-means on each stage to prevent performance gain from dropping, and uses a multi-path encoding scheme to encode a vector with lower distortion. Experimental results on the benchmark datasets show that our method gives substantially improves RVQ and delivers better performance compared to the state-of-the-art.
QuantizationK-means++South ecliptic poleNearest neighbor searchEntropyOptimizationNP-hard problemHybridizationFeature spaceStar...
• #### Domain-matched Pre-training Tasks for Dense Retrieval

Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder models on 1) a recently released set of 65 million synthetically generated questions, and 2) 200 million post-comment pairs from a preexisting dataset of Reddit conversations made available by pushshift.io. We evaluate on a set of information retrieval and dialogue retrieval benchmarks, showing substantial improvements over supervised baselines.
Computational linguisticsInformation retrievalTraining setInformation and communication technologiesDistillationNatural language inferenceModel selectionArchitectureCachingOptimization...
• #### An improved lower bound for multicolor Ramsey numbers and the half-multiplicity Ramsey number problemver. 2

The multicolor Ramsey number problem asks, for each pair of natural numbers $\ell$ and $t$, for the largest $\ell$-coloring of a complete graph with no monochromatic clique of size $t$. Recent works of Conlon-Ferber and Wigderson have improved the longstanding lower bound for this problem. We make a further improvement by replacing an explicit graph appearing in their constructions by a random graph. Graphs useful for this construction are exactly those relevant for a problem of Erd\H{o}s on graphs with no large cliques and few large independent sets. We also make some basic observations about this problem.
GraphRandom graphRankLower and upperVector spaceTurán's theoremGalois fieldHomomorphismStirling numbers of the second kindProbability...
• #### Dwarf stellar haloes: a powerful probe of small scale galaxy formation and the nature of dark matter

We use N-body cosmological simulations and empirical galaxy models to study the merger history of dwarf-mass galaxies (with M_halo~10^10 M_Sun). Our input galaxy models describe the stellar mass-halo mass relation, and the galaxy occupation fraction. The number of major and minor mergers depends on the type of dark matter; in particular, minor mergers are greatly suppressed in warm dark matter models. In addition, the number of mergers that bring in stars is strongly dependent on the galaxy occupation model. For example, minor mergers are negligible for stellar halo growth in models with a high mass threshold for galaxy formation (i.e. 10^9.3 M_Sun at z=0). Moreover, this threshold for galaxy formation can also determine the relative difference (if any) between the stellar haloes of satellite and field dwarfs. Using isolated simulations of dwarf-dwarf mergers, we show that the relative frequency of major and minor mergers predict very different stellar haloes: typically, "intermediate" dark matter merger ratios (~1:5) maximise the growth of distant stellar haloes. We discuss the observability of dwarf stellar haloes and find that the surface brightness of these features are incredibly faint. However, when several dwarfs are stacked together models that form particularly rich stellar haloes could be detectable. Finally, we show that stellar streams in the Galactic halo overlapping in phase-space with known dwarf satellites are likely remnants of their stripped stellar haloes. The mere existence of dwarf stellar haloes can already put constraints on some small-scale models, and thus observational probes should be a high priority.
Stellar haloGalaxyGalaxy FormationDark matterDwarf galaxyMilky WayWarm dark matterStellar-to-halo mass relationVirial massStar...
• #### A Cosmological Underdensity Does Not Solve the Hubble Tension

A potential solution to the Hubble tension is the hypothesis that the Milky Way is located near the center of a matter underdensity. We model this scenario through the Lema\^itre-Tolman-Bondi formalism with the inclusion of a cosmological constant ($\Lambda$LTB) and consider a generalized Gaussian parametrization for the matter density profile. We constrain the underdensity and the background cosmology with a combination of data sets: the Pantheon Sample of type Ia supernovae (both the full catalogue and a redshift-binned version of it), a collection of baryon acoustic oscillations data points and the distance priors extracted from the latest Planck data release. The analysis with the binned supernovae suggests a preference for a $-13 \%$ density drop with a size of approximately 300 Mpc, interestingly matching the prediction for the so-called KBC void already identified on the basis of independent analyses using galaxy distributions. The constraints obtained with the full Pantheon Sample are instead compatible with a homogeneous cosmology and we interpret this radically different result as a cautionary tale about the potential bias introduced by employing a binned supernova data set. We quantify the level of improvement on the Hubble tension by analyzing the constraints on the B-band absolute magnitude of the supernovae, which provides the calibration for the local measurements of $H_0$. Since no significant difference is observed with respect to an analogous fit performed with a standard $\Lambda$CDM cosmology, we conclude that the potential presence of a local underdensity does not resolve the tension and does not significantly degrade current supernova constraints on $H_0$.
SupernovaBaryon acoustic oscillationsHubble constant tensionRedshift binsCosmic voidSupernova Type IaCosmic microwave backgroundCosmologyCosmological parametersDensity parameter...
• #### Hints of dark matter-neutrino interactions in Lyman-$\alpha$ data

In this letter we investigate the possibility that dark matter and (massive) neutrinos can interact via a simple, constant cross section. Building on previous numerical efforts, we constrain this model with CMB, BAO and, in particular, Lyman-$\alpha$ data. We find that the latter hint to a significant departure from $\Lambda$CDM, with a preference for an interaction strength about 3$\sigma$ away from zero. We trace the origin of this preference back to the additional tilt that the interacting scenario can imprint on the Lyman-$\alpha$ flux, solving a well-known tension between early-time and Lyman-$\alpha$ probes. Future work including complementary Lyman-$\alpha$ data will be crucial in order to test these results.
Dark matterCold dark matterMatter power spectrumNeutrinoWarm dark matterBaryon acoustic oscillationsCosmic microwave backgroundNeutrino interactionsBaryon Oscillation Spectroscopic SurveyHIRES spectrometer...
• #### Convexity, large charge and the large-N phase diagram of the $\varphi^4$ theory

In this note we discuss the phase space of the O(2N) vector model in the presence of a quadratic and a quartic interaction by writing the large-N effective potential using large charge methods in dimensions 2<D<4 and 4<D<6. Based on a simple discussion of the convexity properties of the grand potential, we find very different behavior in the two regimes: while in 2<D<4, the theory is well-behaved, the model in 4<D<6 leads to a complex CFT in the UV, consistently with earlier results. We also find a new metastable massive phase in the high-energy regime for the theory on the cylinder.
Effective potentialAsymptotic expansionZeta functionScaling dimensionPhase diagramResummationUnitarityCritical pointResurgencePath integral...
• #### Following the flow for large N and large charge

We discuss the O(2N) vector model in three dimensions. While this model flows to the Wilson-Fisher fixed point when fine tuned, working in a double-scaling limit of large N and large charge allows us to study the model away from the critical point and even to follow the RG flow from the UV to the IR. The crucial observation is that the effective potential -- at leading order in N but exact to all orders in perturbation theory -- is the Legendre transform of the grand potential at fixed charge. This allows us to write an effective action and the free energy for generic values of the coupling in a very simple fashion and without evaluating any Feynman diagrams.
Effective potentialCritical pointEffective actionScaling limitFeynman diagramsTorusZeta functionHiggs phaseCurvatureSaddle point...
• #### On exact overlaps for $\mathfrak{gl}(N)$ symmetric spin chains

We study the integrable two-site states of the quantum integrable models solvable by the nested algebraic Bethe ansatz and possessing $\mathfrak{gl}(N)$-invariant R-matrix. We investigate the overlaps between the integrable two-site states and the wave-functions. To find exact derivations for the factorized overlap formulas for the nested integrable systems is a longstanding unsolved problem. In this paper we give a derivation for a large class of the integrable states of the $\mathfrak{gl}(N)$ symmetric spin chain. The first part of the derivation is to calculate recursion relations for the off-shell overlap that uniquely fix it. Using these recursions we prove that the normalized overlaps of the multi-particle states have factorized forms which contain the products of the one-particle overlaps and the ratio of the Gaudin-like determinants. We also show that the previously proposed overlap formulas agree with our general formula.
Monodromy matrixFinal stateEmbeddingBethe ansatzTransfer matrixAutomorphismMonodromyLax operatorPermutationMatrix product states...
• #### $J\bar T$-deformed CFTs as non-local CFTs

Various holographic set-ups in string theory suggest the existence of non-local, UV complete two-dimensional QFTs that possess Virasoro symmetry, in spite of their non-locality. We argue that $J\bar T$-deformed CFTs are the first concrete realisation of such "non-local CFTs", through a detailed analysis of their classical and quantum symmetry algebra. Classically, the symmetries consist of an infinite set of left-moving conformal and affine $U(1)$ transformations that generate a Witt-Kac-Moody algebra, as well as a set of non-local, field-dependent generalizations of right-moving conformal and affine $U(1)$ transformations, whose algebra depends on the chosen basis. Notably, there exists a basis, denoted as the "flowed" representation, in which the right-moving charge algebra is simply Witt-Kac-Moody. At the quantum level, we provide a concrete prescription for constructing the symmetry generators via a combination of the flow equations they satisfy and the Sugawara construction, and use this to explicitly resolve the ordering ambiguities and the quantum corrections to the generators up to second order in the $J\bar T$ coupling parameter. This construction naturally produces the "flowed" generators, whose algebra is Virasoro-Kac-Moody to all orders in the coupling, with the same central extension as that of the undeformed CFT. We use this input to work out the quantum modifications to the "unflowed" generator algebra. A peculiarity of the Virasoro generators we study is that their zero mode does not equal the Hamiltonian, but is a quadratic function of it; this helps reconcile the Virasoro symmetry with the non-locality of the model. We argue that also $T\bar T$-deformed CFTs posses Virasoro symmetry, and discuss the existence of such a symmetry in more general non-local QFTs.
Conformal field theoryHamiltonianZero modeKac-Moody algebraSpectral flowCentral chargeVirasoro algebraQuantum levelClassical limitSymmetry algebra...
• #### Persistent Homology of Graph Embeddingsver. 2

Popular network models such as the mixed membership and standard stochastic block model are known to exhibit distinct geometric structure when embedded into $\mathbb{R}^{d}$ using spectral methods. The resulting point cloud concentrates around a simplex in the first model, whereas it separates into clusters in the second. By adopting the formalism of generalised random dot-product graphs, we demonstrate that both of these models, and different mixing regimes in the case of mixed membership, may be distinguished by the persistent homology of the underlying point distribution in the case of adjacency spectral embedding. Moreover, despite non-identifiability issues, we show that the persistent homology of the support of the distribution and its super-level sets can be consistently estimated. As an application of our consistency results, we provide a topological hypothesis test for distinguishing the standard and mixed membership stochastic block models.
Persistent homologyGraphPoint cloudNetwork modelEmbeddingSpectral method...
• #### ResNet strikes back: An improved training procedure in timm

The influential Residual Networks designed by He et al. remain the gold-standard architecture in numerous scientific publications. They typically serve as the default architecture in studies, or as baselines when new architectures are proposed. Yet there has been significant progress on best practices for training neural networks since the inception of the ResNet architecture in 2015. Novel optimization & data-augmentation have increased the effectiveness of the training recipes. In this paper, we re-evaluate the performance of the vanilla ResNet-50 when trained with a procedure that integrates such advances. We share competitive training settings and pre-trained models in the timm open-source library, with the hope that they will serve as better baselines for future work. For instance, with our more demanding training setting, a vanilla ResNet-50 reaches 80.4% top-1 accuracy at resolution 224x224 on ImageNet-val without extra data or distillation. We also report the performance achieved with popular models with our training procedure.