Recently bookmarked papers

with concepts:
  • We study two procedures (reverse-mode and forward-mode) for computing the gradient of the validation error with respect to the hyperparameters of any iterative learning algorithm such as stochastic gradient descent. These procedures mirror two methods of computing gradients for recurrent neural networks and have different trade-offs in terms of running time and space requirements. Our formulation of the reverse-mode procedure is linked to previous work by Maclaurin et al. [2015] but does not require reversible dynamics. The forward-mode procedure is suitable for real-time hyperparameter updates, which may significantly speed up hyperparameter optimization on large datasets. We present experiments on data cleaning and on learning task interactions. We also present one large-scale experiment where the use of previous gradient-based methods would be prohibitive.
    HyperparameterOptimizationTraining setNeural networkStochastic gradient descentRecurrent neural networkEntropyReversible dynamicsRegularizationSupport vector machine...
  • Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Finally, we validate the empirical performance of this method on the estimation of regularization constants of L2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.
    HyperparameterOptimizationRegularizationLogistic regressionRegressionTraining setMachine learningGoodness of fitLeast squaresPython...
  • We introduce a general method for improving the convergence rate of gradient-based optimizers that is easy to implement and works well in practice. We demonstrate the effectiveness of the method in a range of optimization problems by applying it to stochastic gradient descent, stochastic gradient descent with Nesterov momentum, and Adam, showing that it significantly reduces the need for the manual tuning of the initial learning rate for these commonly used algorithms. Our method works by dynamically updating the learning rate during optimization using the gradient with respect to the learning rate of the update rule itself. Computing this "hypergradient" needs little additional computation, requires only one extra copy of the original gradient to be stored in memory, and relies upon nothing more than what is provided by reverse-mode automatic differentiation.
    OptimizationHyperparameterStochastic gradient descentLogistic regressionNeural networkConvolution Neural NetworkArchitectureHidden layerDeep Neural NetworksSaturnian satellites...
  • Hyperparameters of deep neural networks are often optimized by grid search, random search or Bayesian optimization. As an alternative, we propose to use the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is known for its state-of-the-art performance in derivative-free optimization. CMA-ES has some useful invariance properties and is friendly to parallel evaluations of solutions. We provide a toy example comparing CMA-ES and state-of-the-art Bayesian optimization algorithms for tuning the hyperparameters of a convolutional neural network for the MNIST dataset on 30 GPUs in parallel.
    OptimizationHyperparameterBayesianDeep Neural NetworksMNIST datasetCovariance matrixConvolution Neural NetworkConcurrenceFitness modelGaussian process...
  • Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.
    RegularizationHyperparameterDeep learningBackpropagationArithmeticBayesianTraining setMeta learningMachine learningDeep Neural Networks...
  • About ten to 20 percent of massive stars may be kicked out of their natal clusters before exploding as supernovae. These "runaway stars" might play a crucial role in driving galactic outflows and enriching the circumgalactic medium with metals. To study this effect, we carry out high resolution dwarf galaxy simulations that include velocity kicks to massive O/B stars above 8 M$_{\odot}$. We consider two scenarios, one that adopts a power law velocity distribution for kick velocities, resulting in more stars with high velocity kicks, and a more moderate scenario with a Maxwellian velocity distribution. We explicitly resolve the multi-phase interstellar medium (ISM), and include non-equilibrium cooling and chemistry channels. We adopt a resolved feedback scheme (\textsc{Griffin}) where we sample individual massive stars from an IMF. We follow the lifetime of these stars and add their photoionising radiation, their UV radiation field, and their photoelectric heating rate to the surrounding gas. At the end of their lifetime we explode the massive population as core collapse supernovae (CCSN). In the simulations with runaway massive stars, we add additional (natal) velocity kicks that mimic two and three body interactions that cannot be fully resolved in our simulations. We find that the inclusion of runaway or walkaway star scenarios has an impact on mass, metal, momentum and energy outflows as well as the respective loading factors. We find an increase in mass, metal and momentum loading by a factor of 2-3, whereas we find an increase in the mean energy loading by a factor of 5 in the runaway case and a factor of 3 in the walkaway case. However, we find that the peak values are increased by a factor of up to 10, independent of the adopted velocity kick model. We conclude that the inclusion of runaway stars could have a significant impact on the global outflow properties of dwarf galaxies.
    Runaway starStarInterstellar mediumOutflow ratesDwarf galaxyMassive starsSupernovaGalaxyCoolingStar formation...
  • We study evolution of single subhaloes with their masses of $\sim10^9 M_\odot$ in a Milky-Way-sized host halo for self-interacting dark matter (SIDM) models. We perform ideal dark-matter-only N-body simulations of halo-subhalo mergers by varying self-scattering cross sections (including a velocity-dependent scenario), subhalo orbits, and internal properties of the subhalo. We calibrate a gravothermal fluid model to predict time evolution in spherical mass density profiles of isolated SIDM haloes with the simulations. We find that tidal effects of SIDM subhaloes can be described with a framework developed for the case of collision-less dark matter, but minor revisions are necessary to explain our SIDM simulation results. As long as the cross section is less than $\sim10\, \mathrm{cm}^2/\mathrm{g}$ and a plausible range of subhalo density profiles at redshifts of $\sim2$ is assumed for initial states, our simulations do not exhibit a prominent feature of gravothermal collapse in the subhalo central density for 10 Gyr. We develop a semi-analytic model of SIDM subhaloes in a time-evolving density core of the host with tidal stripping and self-scattering ram pressure effects. Our semi-analytic approach provides a simple, efficient and physically-intuitive prediction of SIDM subhaloes, but further improvements are needed to account for baryonic effects in the host and the gravothermal instability accelerated by tidal stripping effects.
    Dark matter subhaloSelf-interacting dark matterN-body simulationDark matterNavarro-Frenk-White profileTidal strippingMilky WayRam pressureDark matter particleCalibration...
  • Most systems in nature operate far from equilibrium, displaying time-asymmetric, irreversible dynamics. When a system's elements are numerous, characterizing its nonequilibrium states is challenging due to the expansion of its state space. Inspired by the success of the equilibrium Ising model in investigating disordered systems in the thermodynamic limit, we study the nonequilibrium thermodynamics of the asymmetric Sherrington-Kirkpatrick system as a prototypical model of large-scale nonequilibrium processes. We employ a path integral method to calculate a generating functional over the trajectories to derive exact solutions of the order parameters, conditional entropy of trajectories, and steady-state entropy production of infinitely large networks. The order parameters reveal order-disorder nonequilibrium phase transitions as found in equilibrium systems but no dynamics akin to the spin-glass phase. We find that the entropy production peaks at the phase transition, but it is more prominent outside the critical regime, especially for disordered phases with low entropy rates. While entropy production is becoming popular to characterize various complex systems, our results reveal that increased entropy production is linked with radically different scenarios, and combining multiple thermodynamic quantities yields a more precise picture of a system. These results contribute to an exact analytical theory for studying the thermodynamic properties of large-scale nonequilibrium systems and their phase transitions.
    Entropy productionEntropyDisorderPhase transitionsSherrington-Kirkpatrick modelSteady stateGenerating functionalIsing modelSpin glassSaddle point...
  • The nonequilibrium thermodynamics feature of a Brownian motor is investigated by obtaining exact time-dependent solutions. This in turn enables us to investigate not only the long time property (steady-state) but also the short time the behavior of the system. The general expressions for the free energy, entropy production ${\dot e}_{p}(t)$ as well as entropy extraction ${\dot h}_{d}(t)$ rates are derived for a system that is genuinely driven out of equilibrium by time-independent force as well as by spatially varying thermal background. We show that for a system that operates between hot and cold reservoirs, most of the thermodynamics quantities approach a non-equilibrium steady state in the long time limit. The change in free energy becomes minimal at a steady state. However for a system that operates in a heat bath where its temperature varies linearly in space, the entropy production and extraction rates approach a non-equilibrium steady state while the change in free energy varies linearly in space. This reveals that unlike systems at equilibrium, when systems are driven out of equilibrium, their free energy may not be minimized. The thermodynamic properties of a system that operates between the hot and cold baths are further compared and contrasted with a system that operates in a heat bath where its temperature varies linearly in space along with the reaction coordinate. We show that the entropy, entropy production, and extraction rates are considerably larger for linearly varying temperature case than a system that operates between the hot and cold baths revealing such systems are inherently irreversible. For both cases, in the presence of load or when a distinct temperature difference is retained, the entropy $S(t)$ monotonously increases with time and saturates to a constant value as $t$ further steps up.
    Entropy productionEntropySteady stateNon-equilibrium steady statesLattice (order)DissipationSymmetry breakingBrownian motorParticle velocityTemperature profile...
  • We study the behavior of stationary non-equilibrium two-body correlation functions for Diffusive Systems with equilibrium reference states (DSe). We describe a DSe at the mesoscopic level by $M$ locally conserved continuum fields that evolve through coupled Langevin equations with white noises. The dynamic is designed such that the system may reach equilibrium states for a set of boundary conditions. In this form, we make the system driven to a non-equilibrium stationary state by changing the equilibrium boundary conditions. We decompose the correlations in a known local equilibrium part and another one that contains the non-equilibrium behavior and that we call {\it correlation's excess} $\bar C(x,z)$. We formally derive the differential equations for $\bar C$. To solve them order by order, we define a perturbative expansion around the equilibrium state. We show that the $\bar C$'s first-order expansion, $\bar C^{(1)}$, is always zero for the unique field case, $M=1$. Moreover $\bar C^{(1)}$ is always long-range or zero when $M>1$. Surprisingly we show that their associated fluctuations, the space integrals of $\bar C^{(1)}$, are always zero. Therefore, fluctuations are dominated by local equilibrium up to second-order in the perturbative expansion around the equilibrium. We derive the behaviors of $\bar C^{(1)}$ in real space for dimensions $d=1$ and $2$ explicitly. Finally, we derive the two first perturbative orders of the correlation's excess for a generic $M=2$ case and a hydrodynamic model.
    EntropyPerturbative expansionTwo-point correlation functionLangevin equationHamilton-Jacobi equationWhite noiseReal spaceEvolution equationGrand canonical ensemblePartial differential equation...
  • We show that the liquid-state-theory configurational temperature $\Tc$ defines an energy scale, which for active-matter models of point particles based on a potential-energy function can be used to determine how to adjust model parameters to achieve approximately invariant structure and dynamics when the density is changed. The required parameter changes are calculated from the variation of a single configuration's $\Tc$ upon a uniform scaling of all coordinates. The formalism developed applies for models involving a potential-energy function with hidden scale invariance. The procedure is illustrated by computer simulations of the Kob-Andersen binary Lennard-Jones model with active Ornstein-Uhlenbeck dynamics in three dimensions and the two-dimensional single-component Yukawa model with active Brownian particle dynamics. For the latter model we also show how $\Tc$ may be applied for estimating the MIPS phase boundary, in effect reducing by one the dimension of the parameter space. We finally propose that the ratio between the equilibrium-system temperature corresponding to actual potential energy and $\Tc$ provides a useful measure of how far an active-matter system is from thermal equilibrium.
    Pair potentialRadial distribution functionsScale invarianceLiquidsPhase diagramEntropyDeviations from equilibriumCanonical ensemblePhase separationEffective temperature...
  • We explain equivariant neural networks, a notion underlying breakthroughs in machine learning from deep convolutional neural networks for computer vision to AlphaFold 2 for protein structure prediction, without assuming knowledge of equivariance or neural networks. The basic mathematical ideas are simple but are often obscured by engineering complications that come with practical realizations. We extract and focus on the mathematical aspects, and limit ourselves to a cursory treatment of the engineering issues at the end.
    Neural networkEngineeringMachine learningConvolution Neural NetworkGroup actionGroup representationProteinDeep convolutional neural networksImage ProcessingDilute magnetic semiconductors...
  • This paper examines the determinants of the volatility of futures prices and basis for three commodities: gold, oil and bitcoin -- often dubbed solid, liquid and digital gold -- by using contract-by-contract analysis which has been previously applied to crude oil futures volatility investigations. By extracting the spot and futures daily prices as well as the maturity, trading volume and open interest data for the three assets from 18th December 2017 to 30th November 2021, we find a positive and significant role for trading volume and a possible negative influence of open interest, when significant, in shaping the volatility in all three assets, supporting earlier findings in the context of oil futures. Additionally, we find maturity has a relatively positive significance for bitcoin and oil futures price volatility. Furthermore, our analysis demonstrates that maturity affects the basis of bitcoin and gold positively -- confirming the general theory that the basis converges to zero as maturity nears for bitcoin and gold -- while oil is affected in both directions.
    OilVolatilityMarketLiquidsStandard deviationRegressionVolatilesPortfolioSecurityStatistics...
  • For small thermodynamic systems in contact with a heat bath, we determine the free energy by imposing the following two conditions. First, the quasi-static work in any configuration change is equal to the free energy difference. Second, the temperature dependence of the free energy satisfies the Gibbs-Helmholtz relation. We find that these prerequisites uniquely lead to the free energy of a classical system consisting of $N$-interacting identical particles, up to an additive constant proportional to $N$. The free energy thus determined contains the Gibbs factorial $N!$ in addition to the phase space integration of the Gibbs-Boltzmann factor. The key step in the derivation is to construct a quasi-static decomposition of small thermodynamic systems.
    HamiltonianIdentical particlesPartition functionPermutationPhase spaceCanonical distributionJarzynski equalityNumerical simulationConfinementQuantum mechanics...
  • The thermodynamic relations in the Tsallis statistics were studied with physical quantities. An additive entropic variable related to the Tsallis entropy was introduced by assuming the form of the first law of the thermodynamics. The fluctuations in the Tsallis statistics were derived with physical quantities with the help of the introduced entropic variable. It was shown that the mean squares of the fluctuations of the physical quantities in the Tsallis statistics are the same as those in the conventional thermodynamics. The mean squares of the fluctuations of the Tsallis entropy and the Tsallis temperature were also derived. The mean squares of the relative fluctuations of the Tsallis entropy and that of the Tsallis temperature are represented with heat capacities. It was shown that these fluctuations of the Tsallis quantities have the $q$-dependent terms in the Tsallis statistics of the entropic parameter $q$.
    StatisticsTsallis entropyEntropyPrinciple of maximum entropyCompressibilityExponential functionCanonical ensembleHeavy ion collisionFluctuationTemperature...
  • We study the transport properties of dilute electrolyte solutions on the basis of the fluctuating hydrodynamic equation, which is a set of nonlinear Langevin equations for the ion densities and flow velocity. The nonlinearity of the Langevin equations generally leads to effective kinetic coefficients for the deterministic dynamics of the average ion densities and flow velocity; the effective coefficients generally differ from the counterparts in the Langevin equations and are frequency-dependent. Using the path-integral formalism involving auxiliary fields, we perform systematic perturbation calculations of the effective kinetic coefficients for ion diffusion, shear viscosity, and electrical conductivity, which govern the dynamics on the large length scales. As novel contributions, we study the frequency dependence of the viscosity and conductivity in the one-loop approximation. Regarding the conductivity at finite frequencies, we derive the so-called electrophoretic part in addition to the relaxation part, where the latter has been obtained by Debye and Falkenhagen; it is predicted that the combination of these two parts gives rise to the frequency $\omega_{\rm max}$ proportional to the salt density, at which the real part of the conductivity exhibits a maximum. The zero-frequency limits of the conductivity and shear viscosity coincide with the classical limiting laws for dilute solutions, derived in different means by Debye, Falkenhagen, and Onsager. As for the effective kinetic coefficients for slow ion diffusions in large length scales, our straightforward calculation yields the cross kinetic coefficient between cations and anions. Further, we discuss the possibility of extending the present study to more concentrated solutions.
    GraphLangevin equationRelaxationPropagatorDiffusion coefficientShear viscosityViscosityVertex functionAuxiliary fieldSelf-energy...
  • The statistical distribution for the case of an adiabatically isolated body was obtained in the framework of covariant quantum theory and Wick's rotation in the complex time plane. The covariant formulation of the mechanics of an isolated system lies in the rejection of absolute time and the introduction of proper time as an independent dynamic variable. The equation of motion of proper time is the law of conservation of energy. In this case, the energy of an isolated system is an external parameter for the modified distribution instead of temperature.
    Proper timeQuantum theoryStatistical mechanicsPropagatorQuantizationDensity matrixBoltzmann distributionStatisticsPartition functionMechanical energy...
  • We consider the $N$ particle classical Riesz gas confined in a one-dimensional external harmonic potential with power law interaction of the form $1/r^k$ where $r$ is the separation between particles. As special limits it contains several systems such as Dyson's log-gas ($k\to 0^+$), Calogero-Moser model ($k=2$), 1d one component plasma ($k=-1$) and the hard-rod gas ($k\to \infty$). Despite its growing importance, only large-$N$ field theory and average density profile are known for general $k$. In this Letter, we study the fluctuations in the system by looking at the statistics of the gap between successive particles. This quantity is analogous to the well-known level spacing statistics which is ubiquitous in several branches of physics. We show that the variance goes as $N^{-b_k}$ and we find the $k$ dependence of $b_k$ via direct Monte Carlo simulations. We provide supporting arguments based on microscopic Hessian calculation and a quadratic field theory approach. We compute the gap distribution and study its system size scaling. Except in the range $-1<k<0$, we find scaling for all $k>-2$ with both Gaussian and non-Gaussian scaling forms.
    Monte Carlo methodField theoryStatisticsLevel spacing distributionHamiltonianCoarse grainingPrincipal valueLangevin dynamicsFinite size effectBoltzmann distribution...
  • We study the Quantum Brownian motion of a charged particle moving in a harmonic potential in the presence of an uniform external magnetic field and linearly coupled to an Ohmic bath through momentum variables. We analyse the growth of the mean square displacement of the particle in the classical high temperature domain and in the quantum low temperature domain dominated by zero point fluctuations. We also analyse the Position Response Function and the long time tails of various correlation functions. We notice some distinctive features, different from the usual case of a charged quantum Brownian particle in a magnetic field and linearly coupled to an Ohmic bath via position variables.
    Two-point correlation functionBrownian motionLangevin equationCharged particleConfinementHarmonic oscillatorResidue theoremUltracold atomLangevin dynamicsSpeed of light...
  • We study the non-equilibrium relaxational dynamics of a probe particle linearly coupled to a thermally fluctuating scalar field and subject to a harmonic potential, which provides a cartoon for an optically trapped colloid immersed in a fluid close to its bulk critical point. The average position of the particle initially displaced from the position of mechanical equilibrium is shown to feature long-time algebraic tails as the critical point of the field is approached, the universal exponents of which are determined in arbitrary spatial dimensions. As expected, this behavior cannot be captured by adiabatic approaches which assume fast field relaxation. The predictions of the analytic, perturbative approach are qualitatively confirmed by numerical simulations.
    ColloidRelaxationCritical pointTwo-point correlation functionNumerical simulationDegree of freedomFokker-Planck equationLangevin equationHamiltonianCritical exponent...
  • We outline a reduction scheme for a class of Brownian dynamics which leads to meaningful corrections to the Smoluchowski equation in the overdamped regime. The mobility coefficient of the reduced dynamics is obtained by exploiting the Dynamic Invariance principle, whereas the diffusion coefficient fulfils the Fluctuation-Dissipation theorem. Explicit calculations are carried out in the case of a harmonically bound particle. A quantitative pointwise representation of the reduction error is also provided and connections to both the Maximum Entropy method and the linear response theory are highlighted. Our study paves the way to the development of reduction procedures applicable to a wider class of diffusion processes.
    ManifoldMobilitySmoluchowski equationDiffusion coefficientTwo-point correlation functionFokker-Planck equationEntropyLinear response theoryKramers theoremBoltzmann transport equation...
  • A change in a stochastic system has three representations: Probabilistic, statistical, and informational: (i) is based on random variable $u(\omega)\to\tilde{u}(\omega)$; this induces (ii) the probability distributions $F_u(x)\to F_{\tilde{u}}(x)$, $x\in\mathbb{R}^n$; and (iii) a change in the probability measure $\mathbb{P}\to\tilde{\mathbb{P}}$ under the same observable $u(\omega)$. In the informational representation a change is quantified by the Radon-Nikodym derivative $\ln\left( \frac{ d \tilde{\mathbb{P}}}{ d\mathbb{P}}(\omega)\right)=-\ln\left(\frac{ d F_u}{ d F_{\tilde{u}}}(x)\right)$ when $x=u(\omega)$. Substituting a random variable into its own density function creates a fluctuating entropy whose expectation has been given by Shannon. Informational representation of a deterministic transformation on $\mathbb{R}^n$ reveals entropic and energetic terms, and the notions of configurational entropy of Boltzmann and Gibbs, and potential of mean force of Kirkwood. Mutual information arises for correlated $u(\omega)$ and $\tilde{u}(\omega)$; and a nonequilibrium thermodynamic entropy balance equation is identified.
    EntropyMutual informationEntropy productionStatisticsUniform distributionData scienceInformation theoryBalance equationStatistical physicsStatistical mechanics...
  • Following ideas of Szilard, Mandelbrot and Hill, we show that a statistical thermodynamic structure can emerge purely from the infinitely large data limit under a probabilistic framework independent from their underlying details. Systems with distinct values of a set of observables are identified as different thermodynamic states, which are parameterized by the entropic forces conjugated to the observables. The ground state with zero entropic forces usually has a probabilistic model equipped with a symmetry of interest. The entropic forces lead to symmetry breaking for each particular system that produces the data, cf. emerging of time correlation and breakdown of detailed balance. Probabilistic models for the excited states are predicted by the Maximum Entropy Principle for sequences of i.i.d. and correlated Markov samples. Asymptotically-equivalent models are also found by the Maximum Caliber Principle. With a novel derivation of Maximum Caliber, conceptual differences between the two principles are clarified. The emergent thermodynamics in the data infinitus limit has a mesoscopic origin from the Maximum Caliber. In the canonical probabilistic models of Maximum Caliber, the variances of the observables and their conjugated forces satisfy the asymptotic thermodynamic uncertainty principle, which stems from the reciprocal-curvature relation between "entropy" and "free energy" functions in the theory of large deviations. The mesoscopic origin of the reciprocality is identified. As a consequence of limit theorems in probability theory, the phenomenological statistical thermodynamics is universal without the need of mechanics.
    EntropyMarkov processStatisticsUncertainty principleSymmetry breakingExcited stateEntropy productionCurvatureMarkov chainWentzel-Kramers-Brillouin...
  • We derive astroparticle constraints in different dark matter scenarios alternative to cold dark matter (CDM): thermal relic warm dark matter, WDM; fuzzy dark matter, $\psi$DM; self-interacting dark matter, SIDM; sterile neutrino dark matter, $\nu$DM. Our framework is based on updated determinations of the high-redshift UV luminosity functions for primordial galaxies out to redshift $z\sim 10$, on redshift-dependent halo mass functions in the above DM scenarios from numerical simulations, and on robust constraints on the reionization history of the Universe from recent astrophysical and cosmological datasets. First, we build up an empirical model of cosmic reionization characterized by two parameters, namely the escape fraction $f_{\rm esc}$ of ionizing photons from primordial galaxies, and the limiting UV magnitude $M_{\rm UV}^{\rm lim}$ down to which the extrapolated UV luminosity functions are steeply increasing. Second, we perform standard abundance matching of the UV luminosity function and the halo mass function, obtaining a relationship between UV luminosity and halo mass whose shape depends on an astroparticle quantity $X$ specific of each DM scenario (e.g., WDM particle mass); we exploit such a relation to introduce in the analysis a constraint from primordial galaxy formation, in terms of the threshold halo mass above which primordial galaxies can efficiently form stars. Third, we implement a sequential updating Bayesian MCMC technique to perform joint inference on the three parameters $f_{\rm esc}$, $M_{\rm UV}^{\rm lim}$, $X$, and to compare the outcomes of different DM scenarios on the reionization history. Finally, we highlight the relevance of our astroparticle estimates in predicting the behavior of the high-redshift UV luminosity function at faint, yet unexplored magnitudes, that may be tested with the advent of the James Webb Space Telescope.
    Cold dark matterGalaxy FormationPrimordial galaxiesUV luminosity functionVirial massWarm dark matterHalo mass functionSelf-interacting dark matterReionizationHalo abundance matching...
  • 2205.05845  ,  ,  et al.,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  show less
    The present white paper is submitted as part of the "Snowmass" process to help inform the long-term plans of the United States Department of Energy and the National Science Foundation for high-energy physics. It summarizes the science questions driving the Ultra-High-Energy Cosmic-Ray (UHECR) community and provides recommendations on the strategy to answer them in the next two decades.
    Ultra-high-energy cosmic rayEnergy Frontier experimentEnergyUnits...
  • TeV halos are regions of enhanced photon emissivity surrounding pulsars. While multiple sources have been discovered, a self-consistent explanation of their radial profile and spherically-symmetric morphology remains elusive due to the difficulty in confining high-energy electrons and positrons within ~20 pc regions of the interstellar medium. One proposed solution utilizes anisotropic diffusion to confine the electron population within a "tube" that is auspiciously oriented along the line of sight. In this work, we show that while such models may explain a unique source such as Geminga, the phase space of such solutions is very small and they are unable to simultaneously explain the size and approximate radial symmetry of the TeV halo population.
    PulsarLine of sightDiffusion coefficientGemingaTurbulenceInterstellar mediumInverse ComptonHigh Altitude Water CherenkovHalo populationMilky Way...
  • We provide general effective-theory arguments relating present-day discrepancies in semi-leptonic $B$-meson decays to signals in kaon physics, in particular lepton-flavour violating ones of the kind $K \to (\pi) e^\pm \mu^\mp$. We show that $K$-decay branching ratios of around $10^{-12} - 10^{-13}$ are possible, for effective-theory cutoffs around $5-15$ TeV compatible with discrepancies in $B\to K^{(\ast)} \mu\mu$ decays. We perform a feasibility study of the reach for such decays at LHCb, taking $K^+ \to \pi^+ \mu^\pm e^\mp$ as a benchmark. In spite of the long lifetime of the $K^+$ compared to the detector size, the huge statistics anticipated as well as the overall detector performance translate into encouraging results. These include the possibility to reach the $10^{-12}$ ballpark, and thereby significantly improve current limits. Our results advocate LHC's high-luminosity Upgrade phase, and support analogous sensitivity studies at other facilities. Given the performance uncertainties inherent in the Upgrade phase, our conclusions are based on a range of assumptions we deem realistic on the particle identification performance as well as on the kinematic reconstruction thresholds for the signal candidates.
    LHCb experimentLepton flavour violationMuonKaon decayKaonEffective theoryKinematicsBranching ratioStandard ModelPion...
  • <p>The LHCb Collaboration’s measurement of <inline-formula><mml:math display="inline"><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="script">B</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">→</mml:mo><mml:msup><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>μ</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>μ</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>/</mml:mo><mml:mi mathvariant="script">B</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">→</mml:mo><mml:msup><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>e</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> lies <inline-formula><mml:math display="inline"><mml:mrow><mml:mn>2.6</mml:mn><mml:mi>σ</mml:mi></mml:mrow></mml:math></inline-formula> below the Standard Model prediction. Several groups suggest this deficit to result from new lepton nonuniversal interactions of muons. But nonuniversal leptonic interactions imply lepton flavor violation in <inline-formula><mml:math display="inline"><mml:mi>B</mml:mi></mml:math></inline-formula> decays at rates much larger than are expected in the Standard Model. A simple model shows that these rates could lie just below current limits. An interesting consequence of our model, that <inline-formula><mml:math display="inline"><mml:mrow><mml:mi mathvariant="script">B</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">→</mml:mo><mml:msup><mml:mrow><mml:mi>μ</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>μ</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mtext>exp</mml:mtext></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mspace linebreak="goodbreak"/><mml:mi mathvariant="script">B</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">→</mml:mo><mml:msup><mml:mrow><mml:mi>μ</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mi>μ</mml:mi></mml:mrow><mml:mrow><mml:mo>-</mml:mo></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>SM</mml:mi></mml:mrow></mml:msub><mml:mo>≅</mml:mo><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>≅</mml:mo><mml:mn>0.75</mml:mn></mml:mrow></mml:math></inline-formula>, is compatible with recent measurements of these rates. We stress the importance of searches for lepton flavor violations, especially for <inline-formula><mml:math display="inline"><mml:mi>B</mml:mi><mml:mo stretchy="false">→</mml:mo><mml:mi>K</mml:mi><mml:mi>μ</mml:mi><mml:mi>e</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math display="inline"><mml:mi>K</mml:mi><mml:mi>μ</mml:mi><mml:mi>τ</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math display="inline"><mml:msub><mml:mi>B</mml:mi><mml:mi>s</mml:mi></mml:msub><mml:mo stretchy="false">→</mml:mo><mml:mi>μ</mml:mi><mml:mi>e</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math display="inline"><mml:mi>μ</mml:mi><mml:mi>τ</mml:mi></mml:math></inline-formula>.</p>
    MuonLepton flavour violationRadiative decayRare decaySemileptonic decayFlavour Changing Neutral CurrentsStandard ModelPositronLHCb experimentElectron...
  • We summarize the presentations made within Working Group 3 of the CKM2021 workshop. This working group is devoted to rare $B$, $D$ and $K$ decays, radiative and electroweak-penguin decays, including constraints on $V_{\rm td}/V_{\rm ts}$ and $\epsilon^\prime / \epsilon$. The working group has thus a very broad scope, and includes very topical subjects such as the coherent array of discrepancies in semi-leptonic $B$ decays. Each contribution is here summarized very succinctly with the aim of providing an overview of the main results. The reader interested in fuller details is referred to the individual contributions.
    Standard ModelBranching ratioLHCb experimentLeptoquarkFlavourLattice QCDWilson coefficientsIsospinFinal stateLattice (order)...
  • We prove a generalization of Tur\'{a}n's theorem proposed by Balogh and Lidick\'{y}.
    Turán's theoremGraphContradictionAlgebraProbabilityLagrangian...
  • Runko is a new open-source plasma simulation framework implemented in C++ and Python. It is designed to function as an easy-to-extend general toolbox for simulating astrophysical plasmas with different theoretical and numerical models. Computationally intensive low-level kernels are written in modern C++ taking advantage of polymorphic classes, multiple inheritance, and template metaprogramming. High-level functionality is operated with Python scripts. The hybrid program design ensures good code performance together with ease of use. The framework has a modular object-oriented design that allows the user to easily add new numerical algorithms to the system. The code can be run on various computing platforms ranging from laptops (shared-memory systems) to massively parallel supercomputer architectures (distributed-memory systems). The framework supports heterogeneous multiphysics simulations in which different physical solvers can be combined and run simultaneously. Here we showcase the framework's relativistic particle-in-cell (PIC) module by presenting (i) 1D simulations of relativistic Weibel instability, (ii) 2D simulations of relativistic kinetic turbulence in a suddenly stirred magnetically-dominated pair plasma, and (iii) 3D simulations of collisionless shocks in an unmagnetized medium.
    Particle-in-cellRankTurbulenceLattice (order)Weibel instabilityLorentz factorInstabilityMagnetohydrodynamicsPythonCharged particle...
  • Relativistic magnetized jets, such as those from AGN, GRBs and XRBs, are susceptible to current- and pressure-driven MHD instabilities that can lead to particle acceleration and non-thermal radiation. Here we investigate the development of these instabilities through 3D kinetic simulations of cylindrically symmetric equilibria involving toroidal magnetic fields with electron-positron pair plasma. Generalizing recent treatments by Alves et al. (2018) and Davelaar et al. (2020), we consider a range of initial structures in which the force due to toroidal magnetic field is balanced by a combination of forces due to axial magnetic field and gas pressure. We argue that the particle energy limit identified by Alves et al. (2018) is due to the finite duration of the fast magnetic dissipation phase. We find a rather minor role of electric fields parallel to the local magnetic fields in particle acceleration. In all investigated cases a kink mode arises in the central core region with a growth timescale consistent with the predictions of linearized MHD models. In the case of a gas-pressure-balanced (Z-pinch) profile, we identify a weak local pinch mode well outside the jet core. We argue that pressure-driven modes are important for relativistic jets, in regions where sufficient gas pressure is produced by other dissipation mechanisms.
    DissipationPinchInstabilityToroidal magnetic fieldZ-pinchRelativistic jetPositronAxial magnetic fieldCurrent densityMagnetic reconnection...
  • In this work, we study the magnetic field morphology of selected star-forming clouds spread over the galactic latitude ($b$) range, $-10^\circ$ to $10^\circ$. The polarimetric observation of clouds CB24, CB27 and CB188 are conducted to study the magnetic field geometry of those clouds from ARIES, Manora Peak, Nainital, India. These observations are combined with those of 14 further low latitude clouds available in the literature. Analyzing the polarimetric data of 17 clouds, we find that the alignment between the envelope magnetic field ($\theta_{B}^{env}$) and Galactic plane ($\theta_{GP}$) of the low-latitude clouds varies with their galactic longitudes ($l$). We observe a strong correlation between the longitude (\textit{l}) and the offset ($\theta_{off}=|\theta_B^{env}-\theta_{GP}|$) which shows that $\theta_{B}^{env}$ is parallel to the Galactic plane (GP) when the clouds are situated in the region, $115^\circ<l<250^\circ$. However, $\theta_{B}^{env}$ has its own local deflection irrespective of the orientation of $\theta_{GP}$ when the clouds are at $l<100^\circ$ and $l>250^\circ$. To check the consistency of our results, the stellar polarization data available at Heiles (2000) catalogue are overlaid on DSS image of the clouds having mean polarization vector of field stars. The results are almost consistent with the Heiles data. The effect of turbulence of the cloud is also studied which may play an important role in causing the misalignment phenomenon observed between $\theta_{B}^{env}$ and $\theta_{GP}$. We have used \textit{Herschel} \textit{SPIRE} 500 $\mu m$ and \textit{SCUBA} 850 $\mu m$ dust continuum emission maps in our work to understand the density structure of the clouds.
    OrientationPolarization vectorStarGalactic CenterPosition angleMolecular cloudGalactic planeStar formationTurbulenceGalactic latitude...
  • In the past decade, electroweak penguin decays have provided a number of precision measurements, turning into one of the most competitive ways to search for New Physics that describe beyond the Standard Model phenomena. An overview of the measurements made at the $B$ factories and hadron colliders are given and the experimental methods are presented. Experimental measurements required to provide further insight into present indications of New Physics are discussed.
    Standard ModelBranching ratioFinal stateElectroweakMuonWilson coefficientsFlavourForm factorLHCb experimentCollider...
  • Antimatter is one of the most fascinating aspects of Particle Physics and one of the most unknown ones too. In this article we concisely explain what is antimatter and its distinction between primordial and secondary, how it is produced, where it can be found, the experiments carried out at CERN to create and analyze antiatoms, the problem of the matter-antimatter asymmetry, and the medical and technological applications of antimatter in our society. -- La antimateria es uno de los aspectos m\'as fascinantes de la F\'isica de Part\'iculas, y tambi\'en uno de los m\'as desconocidos. En este art\'iculo explicamos concisamente qu\'e es la antimateria y su diferenciaci\'on entre primordial y secundaria, c\'omo se produce, donde se encuentra, los experimentos que se realizan en el CERN para crear y analizar anti\'atomos, el problema de la asimetr\'ia materia-antimateria, y las aplicaciones m\'edicas y tecnol\'ogicas de la antimateria en nuestra sociedad.
    AntimatterCERNBaryon asymmetry of the UniverseParticle physics...
  • In laser-wakefield acceleration, an ultra-intense laser pulse is focused into an underdense plasma in order to accelerate electrons to relativistic velocities. In most cases, the pulses consist of multiple optical cycles and the interaction is well described in the framework of the ponderomotive force where only the envelope of the laser has to be considered. But when using single-cycle pulses, the ponderomotive approximation breaks down, and the actual waveform of the laser has to be taken into account. In this paper, we use near-single cycle laser pulses to drive a laser-wakefield accelerator. We observe variations of the electron beam pointing on the order of 10 mrad in the polarisation direction, as well as 30% variations of the beam charge, locked to the value of the controlled laser carrier-envelope phase, in both nitrogen and helium plasma. Those findings are explained through particle-in-cell simulations indicating that low-emittance, ultra-short electron bunches are periodically injected off-axis by the transversally oscillating bubble associated with the slipping carrier-envelope phase.
    LasersIonizationTransverse momentumBetatronRelativistic electronParticle-in-cellIntensityTotal-Variation regularizationPhase effectSpectrometers...
  • Blockchains provide environments where parties can interact transparently and securely peer-to-peer without needing a trusted third party. Parties can trust the integrity and correctness of transactions and the verifiable execution of binary code on the blockchain (smart contracts) inside the system. Including information from outside of the blockchain remains challenging. A challenge is data privacy. In a public system, shared data becomes public and, coming from a single source, often lacks credibility. A private system gives the parties control over their data and sources but trades in positive aspects as transparency. Often, not the data itself is the most critical information but the result of a computation performed on it. An example is research data certification. To keep data private but still prove data provenance, researchers can store a hash value of that data on the blockchain. This hash value is either calculated locally on private data without the chance for validation or is calculated on the blockchain, meaning that data must be published and stored on the blockchain -- a problem of the overall data amount stored on and distributed with the ledger. A system we called moving smart contracts bypasses this problem: Data remain local, but trusted nodes can access them and execute trusted smart contract code stored on the blockchain. This method avoids the system-wide distribution of research data and makes it accessible and verifiable with trusted software.
    BlockchainSoftwarePrivacyProgrammingIntellectual PropertyEncryptionPythonP2pResearch and DevelopmentFile system...
  • The past decades have witnessed the flourishing of non-Hermitian physics in non-conservative systems, leading to unprecedented phenomena of unidirectional invisibility, enhanced sensitivity and more recently the novel topological features such as bulk Fermi arcs. Among them, growing efforts have been invested to an intriguing phenomenon, known as the non-Hermitian skin effect (NHSE). Here, we review the recent progress in this emerging field. By starting from the one-dimensional (1D) case, the fundamental concepts of NHSE, its minimal model, the physical meanings and consequences are elaborated in details. In particular, we discuss the NHSE enriched by lattice symmetries, which gives rise to unique non-Hermitian topological properties with revised bulk-boundary correspondence (BBC) and new definitions of topological invariants. Then we extend the discussions to two and higher dimensions, where dimensional surprises enable even more versatile NHSE phenomena. Extensions of NHSE assisted with extra degrees of freedom such as long-range coupling, pseudospins, magnetism, non-linearity and crystal defects are also reviewed. This is followed by the contemporary experimental progress for NHSE. Finally, we provide the outlooks to possible future directions and developments.
    Skin effectLattice (order)Topological invariantDegree of freedomMagnetismMinimal modelsDimensionsSymmetryFieldCrystal...
  • The paper describes the practical work for students visually clarifying the mechanism of the Monte Carlo method applying to approximating the value of Pi. Considering a traditional quadrant (circular sector) inscribed in a square, here we demonstrate the original algorithm for generating random points on the paper: you should arbitrarily tear up a paper blank to small pieces (the first experiment). By the similar way the second experiment (with a preliminary staining procedure by bright colors) can be used to prove the quadratic dependence of the area of a circle on its radius. Manipulations with tearing up a paper as a random sampling algorithm can be applied for solving other teaching problems in physics.
    Monte Carlo methodQuadrantsAlgorithms
  • We explore the assumption, widely used in many astrophysical calculations, that the stellar initial mass function (IMF) is universal across all galaxies. By considering both a canonical Salpeter-like IMF and a non-universal IMF, we are able to compare the effect of different IMFs on multiple observables and derived quantities in astrophysics. Specifically, we consider a non-universal IMF which varies as a function of the local star formation rate, and explore the effects on the star formation rate density (SFRD), the extragalactic background light, the supernova (both core-collapse and thermonuclear) rates, and the diffuse supernova neutrino background. Our most interesting result is that our adopted varying IMF leads to much greater uncertainty on the SFRD at $z \approx 2-4$ than is usually assumed. Indeed, we find a SFRD (inferred using observed galaxy luminosity distributions) that is a factor of $\gtrsim 3$ lower than canonical results obtained using a universal Salpeter-like IMF. Secondly, the non-universal IMF we explore implies a reduction in the supernova core-collapse rate of a factor of $\sim2$, compared against a universal IMF. The other potential tracers are only slightly affected by changes to the properties of the IMF. We find that currently available data do not provide a clear preference for universal or non-universal IMF. However, improvements to measurements of the star formation rate and core-collapse supernova rate at redshifts $z \gtrsim 2$ may offer the best prospects for discernment.
    Initial mass functionStar formation rateGalaxyDiffuse supernova neutrino backgroundStarLuminositySupernovaCalibrationNeutrinoCore-collapse supernova...
  • When interpreted within the standard framework of Newtonian gravity and dynamics, the kinematics of stars and gas in dwarf galaxies reveals that most of these systems are completely dominated by their dark matter halos. These dwarf galaxies are thus among the best astrophysical laboratories to study the structure of dark halos and the nature of dark matter. We review the properties of the dwarf galaxies of the Local Group from the point of view of stellar dynamics. After describing the observed kinematics of their stellar components and providing an overview of the dynamical modelling techniques, we look into the dark matter content and distribution of these galaxies, as inferred from the combination of observed data and dynamical models. We also briefly touch upon the prospects of using nearby dwarf galaxies as targets for indirect detection of dark matter via annihilation or decay emission.
    Local groupDark matterKinematicsMilky WayDark matter haloUltra-faint dwarf spheroidal galaxyStarProper motionDwarf galaxyGalaxy...
  • Constraints on dark matter halo masses from weak gravitational lensing can be improved significantly by using additional information about the morphology of their density distribution, leading to tighter cosmological constraints derived from the halo mass function. This work is the first of two in which we investigate the accuracy of halo morphology and mass measurements in 2D and 3D. To this end, we determine several halo physical properties in the MICE-Grand Challenge dark matter only simulation. We present a public catalogue of these properties that includes density profiles and shape parameters measured in 2D and 3D, the halo centre at the peak of the 3D density distribution as well as the gravitational and kinetic energies and angular momentum vectors. The density profiles are computed using spherical and ellipsoidal radial bins, taking into account the halo shapes. We also provide halo concentrations and masses derived from fits to 2D and 3D density profiles using NFW and Einasto models for halos with more than $1000$ particles ($\gtrsim 3 \times 10^{13} h^{-1} M_{\odot}$). We find that the Einasto model provides better fits compared to NFW, regardless of the halo relaxation state and shape. The mass and concentration parameters of the 3D density profiles derived from fits to the 2D profiles are in general biased. Similar biases are obtained when constraining mass and concentrations using a weak-lensing stacking analysis. We show that these biases depend on the radial range and density profile model adopted in the fitting procedure, but not on the halo shape.
    Navarro-Frenk-White profileDark matter particleDark matter haloVirial massRelaxationHalo concentrationsWeak lensingDensity contrastWeak lensing mass estimateDark matter...
  • Feedback to the interstellar medium (ISM) from ionising radiation, stellar winds and supernovae is central to regulating star formation in galaxies. Due to their low mass ($M_{*} < 10^{9}$\,M$_\odot$), dwarf galaxies are particularly susceptible to such processes, making them ideal sites to study the detailed physics of feedback. In this perspective, we summarise the latest observational evidences for feedback from star forming regions and how this drives the formation of 'superbubbles' and galaxy-wide winds. We discuss the important role of external ionising radiation -- 'reionisation' -- for the smallest galaxies. And, we discuss the observational evidences that this feedback directly impacts galaxy properties such as their star formation histories, metal content, colours, sizes, morphologies and even their inner dark matter densities. We conclude with a look to the future, summarising the key questions that remain unanswered and listing some of the outstanding challenges for galaxy formation theories.
    Dwarf galaxyStar formationGalaxyStellar feedbackInterstellar mediumDark matterStarStar formation rateHealth informaticsOf stars...
  • In recent years, deep neural networks have been successful in both industry and academia, especially for computer vision tasks. The great success of deep learning is mainly due to its scalability to encode large-scale data and to maneuver billions of model parameters. However, it is a challenge to deploy these cumbersome deep models on devices with limited resources, e.g., mobile phones and embedded devices, not only because of the high computational complexity but also the large storage requirements. To this end, a variety of model compression and acceleration techniques have been developed. As a representative type of model compression and acceleration, knowledge distillation effectively learns a small student model from a large teacher model. It has received rapid increasing attention from the community. This paper provides a comprehensive survey of knowledge distillation from the perspectives of knowledge categories, training schemes, teacher-student architecture, distillation algorithms, performance comparison and applications. Furthermore, challenges in knowledge distillation are briefly reviewed and comments on future research are discussed and forwarded.
    DistillationArchitectureGraphAttentionDeep Neural NetworksNeural networkComputational linguisticsData samplingTraining setQuantization...
  • Large-amplitude Alfv\'en waves are subject to parametric decays which can have important consequences in space, astrophysical, and fusion plasmas. Though this Alfv\'en wave parametric decay instability was predicted decades ago, its observational evidence has not been well established, stimulating considerable interest in laboratory demonstration of the instability and associated numerical modeling. Here, we report on novel hybrid simulation modeling of the Alfv\'en wave parametric decay instability in a laboratory plasma (based on the Large Plasma Device), including collisionless ion kinetics. Using realistic wave injection and wave-plasma parameters we identify the threshold Alfv\'en wave amplitudes and frequencies required for triggering the instability in the bounded plasma. These threshold behaviors are corroborated by simple theoretical considerations. Compounding effects such as finite source sizes and ion-neutral collisions are briefly discussed. These hybrid simulations represent a promising tool for investigating laboratory Alfv\'en wave dynamics and our results may help to guide the first laboratory demonstration of the parametric decay instability.
    InstabilityAcoustic waveLandau dampingElectron temperaturePlasma parameterAlfvén waveDamping rateThermal speedPlane waveAstrophysical plasma...
  • Deep neural networks (DNNs) have recently achieved great success in many visual recognition tasks. However, existing deep neural network models are computationally expensive and memory intensive, hindering their deployment in devices with low memory resources or in applications with strict latency requirements. Therefore, a natural thought is to perform model compression and acceleration in deep networks without significantly decreasing the model performance. During the past five years, tremendous progress has been made in this area. In this paper, we review the recent techniques for compacting and accelerating DNN models. In general, these techniques are divided into four categories: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Methods of parameter pruning and quantization are described first, after that the other techniques are introduced. For each category, we also provide insightful analysis about the performance, related applications, advantages, and drawbacks. Then we go through some very recent successful methods, for example, dynamic capacity networks and stochastic depths networks. After that, we survey the evaluation matrices, the main datasets used for evaluating the model performance, and recent benchmark efforts. Finally, we conclude this paper, discuss remaining the challenges and possible directions for future work.
    Deep Neural NetworksQuantizationRankConvolution Neural NetworkFully connected layerArchitectureDistillationClassificationNeural networkSparsity...
  • We introduce a data-free quantization method for deep neural networks that does not require fine-tuning or hyperparameter selection. It achieves near-original model performance on common computer vision architectures and tasks. 8-bit fixed-point quantization is essential for efficient inference on modern deep learning hardware. However, quantizing models to run in 8-bit is a non-trivial task, frequently leading to either significant performance reduction or engineering time spent on training a network to be amenable to quantization. Our approach relies on equalizing the weight ranges in the network by making use of a scale-equivariance property of activation functions. In addition the method corrects biases in the error that are introduced during quantization. This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call. For common architectures, such as the MobileNet family, we achieve state-of-the-art quantized model performance. We further show that the method also extends to other computer vision architectures and tasks such as semantic segmentation and object detection.
    QuantizationArchitectureImage ProcessingActivation functionDeep learningSemantic segmentationInferenceObject detectionHyperparameterApplication programming interface...
  • We present a post-training weight pruning method for deep neural networks that achieves accuracy levels tolerable for the production setting and that is sufficiently fast to be run on commodity hardware such as desktop CPUs or edge devices. We propose a data-free extension of the approach for computer vision models based on automatically-generated synthetic fractal images. We obtain state-of-the-art results for data-free neural network pruning, with ~1.5% top@1 accuracy drop for a ResNet50 on ImageNet at 50% sparsity rate. When using real data, we are able to get a ResNet50 model on ImageNet with 65% sparsity rate in 8-bit precision in a post-training setting with a ~1% top@1 accuracy drop. We release the code as a part of the OpenVINO(TM) Post-Training Optimization tool.
    SparsityDeep Neural NetworksQuantizationImage ProcessingOptimizationFractalTraining setCalibrationInferenceDistillation...
  • Lately, post-training quantization methods have gained considerable attention, as they are simple to use, and require only a small unlabeled calibration set. This small dataset cannot be used to fine-tune the model without significant over-fitting. Instead, these methods only use the calibration set to set the activations' dynamic ranges. However, such methods always resulted in significant accuracy degradation, when used below 8-bits (except on small datasets). Here we aim to break the 8-bit barrier. To this end, we minimize the quantization errors of each layer separately by optimizing its parameters over the calibration set. We empirically demonstrate that this approach is: (1) much less susceptible to over-fitting than the standard fine-tuning approaches, and can be used even on a very small calibration set; and (2) more powerful than previous methods, which only set the activations' dynamic ranges. Furthermore, we demonstrate how to optimally allocate the bit-widths for each layer, while constraining accuracy degradation or model compression by proposing a novel integer programming formulation. Finally, we suggest model global statistics tuning, to correct biases introduced during quantization. Together, these methods yield state-of-the-art results for both vision and text models. For instance, on ResNet50, we obtain less than 1\% accuracy degradation --- with 4-bit weights and activations in all layers, but the smallest two. We open-sourced our code.
    QuantizationCalibrationStatisticsOverfittingProgrammingOptimizationTraining setAttentionInferenceMean squared error...
  • Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. Pushing state of the art on salient tasks within these domains corresponds to these models becoming larger and more difficult for machine learning practitioners to use given the increasing memory and storage requirements, not to mention the larger carbon footprint. Thus, in recent years there has been a resurgence in model compression techniques, particularly for deep convolutional neural networks and self-attention based networks such as the Transformer. Hence, this paper provides a timely overview of both old and current compression techniques for deep neural networks, including pruning, quantization, tensor decomposition, knowledge distillation and combinations thereof. We assume a basic familiarity with deep learning architectures\footnote{For an introduction to deep learning, see ~\citet{goodfellow2016deep}}, namely, Recurrent Neural Networks~\citep[(RNNs)][]{rumelhart1985learning,hochreiter1997long}, Convolutional Neural Networks~\citep{fukushima1980neocognitron}~\footnote{For an up to date overview see~\citet{khan2019survey}} and Self-Attention based networks~\citep{vaswani2017attention}\footnote{For a general overview of self-attention networks, see ~\citet{chaudhari2019attentive}.},\footnote{For more detail and their use in natural language processing, see~\citet{hu2019introductory}}. Most of the papers discussed are proposed in the context of at least one of these DNN architectures.
    QuantizationArchitectureDistillationSparsityConvolution Neural NetworkAttentionRankRegularizationHidden layerNeural network...