Recently bookmarked papers

with concepts:
  • One of the key points in music recommendation is authoring engaging playlists according to sentiment and emotions. While previous works were mostly based on audio for music discovery and playlists generation, we take advantage of our synchronized lyrics dataset to combine text representations and music features in a novel way; we therefore introduce the Synchronized Lyrics Emotion Dataset. Unlike other approaches that randomly exploited the audio samples and the whole text, our data is split according to the temporal information provided by the synchronization between lyrics and audio. This work shows a comparison between text-based and audio-based deep learning classification models using different techniques from Natural Language Processing and Music Information Retrieval domains. From the experiments on audio we conclude that using vocals only, instead of the whole audio data improves the overall performances of the audio classifier. In the lyrics experiments we exploit the state-of-the-art word representations applied to the main Deep Learning architectures available in literature. In our benchmarks the results show how the Bilinear LSTM classifier with Attention based on fastText word embedding performs better than the CNN applied on audio.
    ClassificationLong short term memoryConvolutional neural networkEmbeddingComputational linguisticsAttentionArchitectureText ClassificationDeep learningMusic information retrieval...
  • We use the IllustrisTNG (TNG) cosmological simulations to provide theoretical expectations for the dark matter mass fractions (DMFs) and circular velocity profiles of galaxies. TNG predicts flat circular velocity curves for $z = 0$ Milky Way (MW)-like galaxies beyond a few kpc from the galaxy centre, in better agreement with observational constraints than its predecessor, Illustris. TNG also predicts an enhancement of the dark matter mass within the 3D stellar half-mass radius ($r_\mathrm{half}$; $M_\mathrm{200c} = 10^{10}-10^{13}\mathrm{M}_{\odot}$, $z \le2$) compared to its dark matter only and Illustris counterparts. This enhancement leads TNG present-day galaxies to be dominated by dark matter within their inner regions, with $f_\mathrm{DM}(<r_\mathrm{half})\gtrsim0.5$ at all masses and with a minimum for MW-mass galaxies. The 1$\sigma$ scatter is $\lesssim$ 10~per~cent at all apertures, which is smaller than that inferred by some observational datasets, e.g. 40 per cent from the SLUGGS survey. TNG agrees with the majority of the observationally inferred values for elliptical galaxies once a consistent IMF is adopted (Chabrier) and the DMFs are measured within the same apertures. The DMFs measured within $r_\mathrm{half}$ increase towards lower redshifts: this evolution is dominated by the increase in galaxy size with time. At $z\sim2$, the DMF in disc-like TNG galaxies decreases with increasing galaxy mass, with $f_\mathrm{DM}(<r_\mathrm{half}) \sim 0.10-0.65$ for $10^{10} \lesssim M_{\rm stars}/\mathrm{M}_{\odot} \lesssim 10^{12}$, and are two times higher than if TNG galaxies resided in Navarro-Frenk-White dark matter haloes unaffected by baryonic physics. It remains to be properly assessed whether recent observational estimates of the DMFs at $z\sim2$ rule out the contraction of the dark matter haloes predicted by the TNG model.
    GalaxyDark matter fractionIllustrisTNG simulationDark matterIllustris simulationMilky WayStellar massCircular velocityVirial massDark matter halo...
  • Metrics of model goodness-of-fit, model comparison, and model parameter estimation are the main categories of statistical problems in science. Bayesian and frequentist methods that address these questions often rely on a likelihood function, which describes the plausibility of model parameters given observed data. In some complex systems or experimental setups predicting the outcome of a model cannot be done analytically and Monte Carlo techniques are used. In this paper, we present a new analytic likelihood that takes into account Monte Carlo uncertainties, appropriate for use in large or small statistics regimes. Our formulation has better performance than semi-analytic methods, prevents strong claims on biased statements, and results in better coverage properties than available methods.
    Monte Carlo methodStatisticsStatistical errorFrequentist approachTest statisticGamma distributionGaussian distributionLikelihood functionSystematic errorBayesian...
  • Did the Indian and Babylonian astronomy evolve in isolation, was there mutual influence, or was one dependent on the other? Scholars have debated these questions for more than two centuries, and opinion has swung one way or the other with time. The similarities between the two systems that have been investigated are: the use of 30 divisions of the lunar month; the 360 divisions of the civil year; the length of the year; and the solar zodiac. Some have wondered if the Babylonian planetary tables might have played a role in the theories of the siddhantas. I shall in this essay go over the essentials of the early Indian and Babylonian astronomy and summarize the latest views on the relationship between them. I shall show that the key ideas found in the Babylonian astronomy of 700 BC are already present in the Vedic texts, which even by the most conservative reckoning are centuries older than this period. I shall also show that the solar zodiac (rashis) was used in Vedic India and I shall present a plausible derivation of the symbols of the solar zodiac from the deities of the lunar segments. This does not mean that the Babylonian astronomy and astrology is derived from the Indian tradition. If at all there was borrowing, that was restricted to the most general ideas only. The nature of Indian and Babylonian astronomical methods is quite different. I propose that it is most likely that Babylonian astronomy emerged independently.
    MoonSunDeityEclipticStarPlanetCalendarsWinter solsticeObservational astronomyVernal equinox...
  • This is a very gentle introductory course on quantum mechanics aimed at the first years of the undergraduate level. The basic concepts are introduced, with many applications and illustrations. Contains 12 short chapters of equal length, ideal for a one term course. The license allows reuse of figures and text under the Attribution-Noncommercial-ShareAlike conditions.
    Quantum mechanicsWave packetHamiltonianEigenfunctionComplex numberVector spaceHydrogen atomExpectation ValueBosonizationStrangeness...
  • This paper provides an overview of the birth and early development of Indian astronomy. Taking account of significant new findings from archaeology and literary analysis, it is shown that early mathematical astronomy arose in India in the second millennium BC. The paper reviews the astronomy of the period of the Vedas, the Brahmanas, and the Vedanga Jyotisha. The origins of Puranic cosmology are also explained.
    SunEarthPlanetMercuryVenusJupiterDeityCalendarsWinter solsticeSaturn...
  • It is said that Einstein's conceptual base for the theory of relativity was the philosophy formulated by Immanuel Kant. Then, is it possible to see how Kant played a role in Einstein's thinking without reading Kant's books? This question arises because it is not possible for physicists to read Kant's writings. Yes, it is possible if we use the method of physics. It is known also that Kant's mode of thinking was profoundly affected by the geography of Koenigsberg where he spent eighty years of entire life. We examine what aspect of this geography led Kant to create his philosophy upon which Einstein's concept of relativity was based. It is pointed out that the Eastern philosophy of Taoism is a product of the geographical environment similar to that of Kant's Koenigsberg, and therefore that Kantianism is strikingly similar to Taoism.
    Relativistic astrophysicsQuantum mechanicsSunSpecial relativityMarylandKeyphraseDegree of freedomElectrodynamicsPlane waveEarth...
  • Three illustrated lectures given by Stephen Hawking as part of a series of six lectures with Roger Penrose on the nature of space and time sponsored by Princeton University Press.
  • FASER, the ForwArd Search ExpeRiment, is a proposed experiment dedicated to searching for light, extremely weakly-interacting particles at the LHC. Such particles may be produced in the LHC's high-energy collisions in large numbers in the far-forward region and then travel long distances through concrete and rock without interacting. They may then decay to visible particles in FASER, which is placed 480 m downstream of the ATLAS interaction point. In this work, we describe the FASER program. In its first stage, FASER is an extremely compact and inexpensive detector, sensitive to decays in a cylindrical region of radius R = 10 cm and length L = 1.5 m. FASER is planned to be constructed and installed in Long Shutdown 2 and will collect data during Run 3 of the 14 TeV LHC from 2021-23. If FASER is successful, FASER 2, a much larger successor with roughly R ~ 1 m and L ~ 5 m, could be constructed in Long Shutdown 3 and collect data during the HL-LHC era from 2026-35. FASER and FASER 2 have the potential to discover dark photons, dark Higgs bosons, heavy neutral leptons, axion-like particles, and many other long-lived particles, as well as provide new information about neutrinos, with potentially far-ranging implications for particle physics and cosmology. We describe the current status, anticipated challenges, and discovery prospects of the FASER program.
    ForwArd Search ExpeRimentLarge Hadron ColliderInteraction pointATLAS Experiment at CERNLong Lived ParticleHigh-luminosity LHCStandard ModelHiggs bosonDecay volumeHidden photon...
  • Ion fractional charge states, measured in situ in the heliosphere, depend on the properties of the plasma in the inner corona. As the ions travel outward in the solar wind and the electron density drops, the charge states remain essentially unaltered or "frozen in". Thus they can provide a powerful constraint on heating models of the corona and acceleration of the solar wind. We have implemented non-equilibrium ionization calculations into a 1D wave-turbulence-driven (WTD) hydrodynamic solar wind model and compared modeled charge states with the Ulysses 1994-5 in situ measurements. We have found that modeled charge state ratios of $C^{6+}/C^{5+}$ and $O^{7+}/O^{6+}$, among others, were too low compared with Ulysses measurements. However, a heuristic reduction of the plasma flow speed has been able to bring the modeled results in line with observations, though other ideas have been proposed to address this discrepancy. We discuss implications of our results and the prospect of including ion charge state calculations into our 3D MHD model of the inner heliosphere.
    Solar windUlyssesCoronaTurbulenceFractional chargeHeliosphereIonizationSolar coronaFluid dynamicsSteady state...
  • In this paper we consider the optimal control of semilinear fractional PDEs with both spectral and integral fractional diffusion operators of order $2s$ with $s \in (0,1)$. We first prove the boundedness of solutions to both semilinear fractional PDEs under minimal regularity assumptions on domain and data. We next introduce an optimal growth condition on the nonlinearity to show the Lipschitz continuity of the solution map for the semilinear elliptic equations with respect to the data. We further apply our ideas to show existence of solutions to optimal control problems with semilinear fractional equations as constraints. Under the standard assumptions on the nonlinearity (twice continuously differentiable) we derive the first and second order optimality conditions.
    Weak solutionEmbeddingLipschitz continuitySobolev spaceHölder's inequalityEigenfunctionBounded operatorOrder operatorDualityOptimization...
  • Intermediate-mass black holes (BHs) in local dwarf galaxies are considered the relics of the early seed BHs. However, their growth might have been impacted by galaxy mergers and BH feedback so that they cannot be treated as tracers of the early seed BH population.
    Black holeDwarf galaxySupermassive black holeGalaxy mergerIntermediate-mass black hole...
  • The low-latitude globular clusters Terzan 10 and Djorgovski 1 are vprojected in the Galactic bulge, in a Galactic region highly affected by extinction. A discrepancy of a factor of ~2 exists in the literature in regards to the distance determination of these clusters. We revisit the colour-magnitude diagrams (CMDs) of these two globular clusters with the purpose of disentangling their distance determination ambiguity and, for the first time, of determining their orbits to identify whether or not they are part of the bulge/bar region. We use Hubble Space Telescope CMDs, with the filters F606W from ACS and F160W from WFC3 for Terzan 10, and F606W and F814W from ACS for Djorgosvski~1, and combine them with the proper motions from Gaia Date Release 2. For the orbit integrations, we employed a steady Galactic model with bar. For the first time the blue horizontal branch of these clusters is clearly resolved. We obtain reliable distances of dSun = 10.3+-1.0 kpc and 9.3+-0.5 kpc for Terzan 10, and Djorgovski 1 respectively, indicating that they are both currently located in the bulge volume. From Gaia DR 2 proper motions, together with our new distance determination and recent literature radial velocities, we are able to show that the two sample clusters have typical halo orbits that are passing by the bulge/bar region, but that they are not part of this component. For the first time, halo intruders are identified in the bulge.
    Hertzsprung-Russell diagramReddeningGlobular clusterProper motionRadial velocityHubble Space TelescopeGalactic BulgeStarEccentricityWide Field Camera 3...
  • High Mass X-ray Binaries (HMXB) have been revealed by a wealth of multi-wavelength observations, from X-ray to optical and infrared domain. After describing the 3 different kinds of HMXB, we focus on 3 HMXB hosting supergiant stars: IGR J16320-4751, IGR J16465-4507 and IGR J16318-4848, respectively called "The Good", "The Bad" and "The Ugly". We review in these proceedings what the observations of these sources have brought to light concerning our knowledge of HMXB, and what part still remains in the dark side. Many questions are still pending, related to accretion processes, stellar wind properties in these massive and active stars, and the overall evolution due to transfer of mass and angular momentum between the companion star and the compact object. Future observations should be able to answer these questions, which constitute the dark side of HMXB.
    High-mass x-ray binaryStellar windSupergiantCompact starAccretionCompanion starsSupergiant starsRoche LobeStellar classificationSpectral energy distribution...
  • This note summarizes the activities and the scientific and technical perspectives of the Laboratoire de Physique Nucleaire et de Hautes Energies (LPNHE) at Sorbonne University, Paris. Although the ESPP is specifically aimed at particle physics, we discuss in this note in parallel the three scientific lines developed at LPNHE (Particle Physics, Astroparticles, Cosmology), first with the current scientific activities, then for the future activities. However, our conclusions and recommendations are focused on the particle physics strategy.
    Higgs bosonNeutrinoStandard ModelDark matterColliderLHCb experimentHigh-luminosity LHCCosmologyATLAS Experiment at CERNLarge Hadron Collider...
  • The inclusion of heavy neutral leptons to the Standard Model particle content could provide solutions to many open questions in particle physics and cosmology. The modification of the charged and neutral currents from active-sterile mixing of neutral leptons can provide novel signatures in Standard Model processes. We revisit the displaced vertex signature that could occur in collisions at the LHC via the decay of heavy neutral leptons with masses of a few GeV emphasizing the implications of flavor, kinematics, inclusive production and number of these extra neutral fermions. We study in particular the implication on the parameter space sensitivity when all mixings to active flavors are taken into account. We also discuss alternative cases where the new particles are produced in a boosted regime.
    Sterile neutrinoDisplaced verticesLarge Hadron ColliderStandard ModelDecay channelsQCD jetBranching ratioMuonCharged leptonKinematics...
  • We give some new, simple results on the equation X^p + Y^p = Z^q.
  • We revisit the chiral kinetic equation from high density effective theory approach, finding a chiral kinetic equation differs from counterpart derived from field theory in high order terms in the $O(1/\mu)$ expansion, but in agreement with the equation derived in on-shell effective field theory upon identification of cutoff. By using reparametrization transformation properties of the effective theory, we show that the difference in kinetic equations from two approaches are in fact expected. It is simply due to different choices of degree of freedom by effective theory and field theory. We also show that they give equivalent description of the dynamics of chiral fermions.
    Effective field theoryField theoryDegree of freedomWigner distribution functionHigh Density Effective TheoryKinetic equationTransport equationChiral kinetic theoryEffective theoryChiral fermion...
  • An isolated star moving supersonically through a uniform gas accretes material from its gravitationally-induced wake. The rate of accretion is set by the accretion radius of the star and is well-described by classical Bondi-Hoyle-Lyttleton theory. Stars, however, are not born in isolation. They form in clusters where they accrete material that is influenced by all the stars in the cluster. We perform three-dimensional hydrodynamic simulations of clusters of individual accretors embedded in a uniform-density wind in order to study how the accretion rates experienced by individual cluster members are altered by the properties of the ambient gas and the cluster itself. We study accretion as a function of number of cluster members, mean separation between them, and size of their individual accretion radii. We determine the effect of these key parameters on the aggregate and individual accretion rates, which we compare to analytic predictions. We show that when the accretion radii of the individual objects in the cluster substantially overlap, the surrounding gas is effectively accreted into the collective potential of the cluster prior to being accreted onto the individual stars. We find that individual cluster members can accrete drastically more than they would in isolation, in particular when the flow is able to cool efficiently. This effect could potentially modify the luminosity of accreting compact objects in star clusters and could lead to the rejuvenation of young star clusters as well as globular clusters with low-inclination and low-eccentricity.
    AccretionMass accretion rateStarCoolingStar clusterGlobular clusterFluid dynamicsBow shockStar systemsPressure support...
  • We present constraints on Horndeski gravity from a combined analysis of cosmic shear, galaxy-galaxy lensing and galaxy clustering from $450\,\mathrm{deg}^2$ of the Kilo-Degree Survey (KiDS) and the Galaxy And Mass Assembly (GAMA) survey, including all cross-correlations. The Horndeski class of dark energy/modified gravity models includes the majority of universally coupled extensions to $\Lambda$CDM with one scalar degree of freedom in addition to the metric. We study the functions of time that fully describe the evolution of linear perturbations in Horndeski gravity, and set constraints on parameters that describe their time evolution. Our results are compatible throughout with a $\Lambda$CDM model. Assuming proportionality of the Horndeski functions $\alpha_B$ and $\alpha_M$ (describing the braiding of the scalar field with the metric and the Planck mass run rate, respectively) to the dark energy density fraction $\Omega_{\mathrm{DE}}(a) = 1 - \Omega_{\mathrm{m}}(a)$, we find for the proportionality coefficients $\hat{\alpha}_B = 0.20_{-0.33}^{+0.20} \,$ and $\, \hat{\alpha}_M = 0.25_{-0.29}^{+0.19}$. Our value of $S_8 \equiv \sigma_8 \sqrt{\Omega_{\mathrm{m}}/0.3}$ is in better agreement with the $Planck$ estimate when measured in the enlarged Horndeski parameter space than in a pure $\Lambda$CDM scenario. In our Horndeski gravity analysis of cosmic shear alone, we report a downward shift of the $S_8$ best fit value from the $Planck$ measurement of $\Delta S_8 = 0.048_{-0.056}^{+0.059}$, compared to $\Delta S_8 = 0.091_{-0.045}^{+0.046}$ in $\Lambda$CDM. In the joint three-probe analysis, we find $\Delta S_8 = 0.016_{-0.046}^{+0.048}$ in Horndeski gravity and $\Delta S_8 = 0.059_{-0.039}^{+0.040}$ in $\Lambda$CDM. Our likelihood code for multi-probe analysis in both $\Lambda$CDM and Horndeski gravity is made publicly available.
    Large scale structureHorndeski gravityCosmic shearKiDS surveyModified gravityCosmic microwave backgroundDark energyGalaxy And Mass Assembly surveyCosmological parametersGalaxy galaxy lensing...
  • The presence of primordial magnetic fields increases the minimum halo mass in which star formation is possible at high redshifts. Estimates of the dynamical mass of ultra-faint dwarf galaxies (UFDs) within their half-light radius constrain their virialized halo mass before their infall into the Milky Way. The inferred halo mass and formation redshift of the UFDs place upper bounds on the primordial comoving magnetic field, $B_0$. We derive an upper limit of $0.50\pm 0.086$ ($0.31\pm 0.04$) nG on $B_0~$ assuming the average formation redshift of the UFD host halos is $z_{\rm form}=$ 10 (20), respectively.
    Ultra-faint dwarf spheroidal galaxyCosmological magnetic fieldVirial massHalf-light radiusStar formationMilky WayRedshiftMassMagnetic field...
  • Sampling logconcave functions arising in statistics and machine learning has been a subject of intensive study. Recent developments include analyses for Langevin dynamics and Hamiltonian Monte Carlo (HMC). While both approaches have dimension-independent bounds for the underlying $\mathit{continuous}$ processes under sufficiently strong smoothness conditions, the resulting discrete algorithms have complexity and number of function evaluations growing with the dimension. Motivated by this problem, in this paper, we give a general algorithm for solving multivariate ordinary differential equations whose solution is close to the span of a known basis of functions (e.g., polynomials or piecewise polynomials). The resulting algorithm has polylogarithmic depth and essentially tight runtime - it is nearly linear in the size of the representation of the solution. We apply this to the sampling problem to obtain a nearly linear implementation of HMC for a broad class of smooth, strongly logconcave densities, with the number of iterations (parallel depth) and gradient evaluations being $\mathit{polylogarithmic}$ in the dimension (rather than polynomial as in previous work). This class includes the widely-used loss function for logistic regression with incoherent weight matrices and has been subject of much study recently. We also give a faster algorithm with $ \mathit{polylogarithmic~depth}$ for the more general and standard class of strongly convex functions with Lipschitz gradient. These results are based on (1) an improved contraction bound for the exact HMC process and (2) logarithmic bounds on the degree of polynomials that approximate solutions of the differential equations arising in implementing HMC.
    Collocation methodLogistic regressionPicardLangevin dynamicsOrdinary differential equationsHamiltonianMachine learningStatisticsGaussian distributionBrownian motion...
  • A promising class of generative models maps points from a simple distribution to a complex distribution through an invertible neural network. Likelihood-based training of these models requires restricting their architectures to allow cheap computation of Jacobian determinants. Alternatively, the Jacobian trace can be used if the transformation is specified by an ordinary differential equation. In this paper, we use Hutchinson's trace estimator to give a scalable unbiased estimate of the log-density. The result is a continuous-time invertible generative model with unbiased density estimation and one-pass sampling, while allowing unrestricted neural network architectures. We demonstrate our approach on high-dimensional density estimation, image generation, and variational inference, achieving the state-of-the-art among exact likelihood methods with efficient sampling.
    ArchitectureOrdinary differential equationsNeural networkGenerative modelStatistical estimatorInferenceInitial value problemHidden layerRankActivation function...
  • We present a mathematical model for geometric deep learning based upon a scattering transform defined over manifolds, which generalizes the wavelet scattering transform of Mallat. This geometric scattering transform is (locally) invariant to isometry group actions, and we conjecture that it is stable to actions of the diffeomorphism group.
    ManifoldIsometryDiffeomorphismWaveletConvolutional neural networkDeep learningPropagatorIsometry groupGraphGroup action...
  • In many statistical applications that concern mathematical psychologists, the concept of Fisher information plays an important role. In this tutorial we clarify the concept of Fisher information as it manifests itself across three different statistical paradigms. First, in the frequentist paradigm, Fisher information is used to construct hypothesis tests and confidence intervals using maximum likelihood estimators; second, in the Bayesian paradigm, Fisher information is used to define a default prior; lastly, in the minimum description length paradigm, Fisher information is used to measure model complexity.
    Fisher informationMaximum likelihood estimateStatistical estimatorStatisticsGaussian distributionGoodness of fitConfidence intervalEntropyFrequentist approachBayesian...
  • Solomonoff's general theory of inference and the Minimum Description Length principle formalize Occam's razor, and hold that a good model of data is a model that is good at losslessly compressing the data, including the cost of describing the model itself. Deep neural networks might seem to go against this principle given the large number of parameters to be encoded. We demonstrate experimentally the ability of deep neural networks to compress the training data even when accounting for parameter encoding. The compression viewpoint originally motivated the use of variational methods in neural networks. Unexpectedly, we found that these variational methods provide surprisingly poor compression bounds, despite being explicitly built to minimize such bounds. This might explain the relatively poor practical performance of variational methods in deep learning. On the other hand, simple incremental encoding methods yield excellent compression values on deep networks, vindicating Solomonoff's approach.
    Deep learningVariational methodBayesianNeural networkMutual informationAlice and BobInferencePrecisionClassificationArchitecture...
  • We present a formal measure-theoretical theory of neural networks (NN) built on probability coupling theory. Our main contributions are summarized as follows. * Built on the formalism of probability coupling theory, we derive an algorithm framework, named Hierarchical Measure Group and Approximate System (HMGAS), nicknamed S-System, that is designed to learn the complex hierarchical, statistical dependency in the physical world. * We show that NNs are special cases of S-System when the probability kernels assume certain exponential family distributions. Activation Functions are derived formally. We further endow geometry on NNs through information geometry, show that intermediate feature spaces of NNs are stochastic manifolds, and prove that "distance" between samples is contracted as layers stack up. * S-System shows NNs are inherently stochastic, and under a set of realistic boundedness and diversity conditions, it enables us to prove that for large size nonlinear deep NNs with a class of losses, including the hinge loss, all local minima are global minima with zero loss errors, and regions around the minima are flat basins where all eigenvalues of Hessians are concentrated around zero, using tools and ideas from mean field theory, random matrix theory, and nonlinear operator equations. * S-System, the information-geometry structure and the optimization behaviors combined completes the analog between Renormalization Group (RG) and NNs. It shows that a NN is a complex adaptive system that estimates the statistic dependency of microscopic object, e.g., pixels, in multiple scales. Unlike clear-cut physical quantity produced by RG in physics, e.g., temperature, NNs renormalize/recompose manifolds emerging through learning/optimization that divide the sample space into highly semantically meaningful groups that are dictated by supervised labels (in supervised NNs).
    ManifoldOptimizationActivation functionInformation geometryRenormalization groupRandom matrixStatisticsDiffeomorphismNeural networkComplex systems...
  • The aim of this paper is to discuss the use of Haar scattering networks, which is a very simple architecture that naturally supports a large number of stacked layers, yet with very few parameters, in a relatively broad set of pattern recognition problems, including regression and classification tasks. This architecture, basically, consists of stacking convolutional filters, that can be thought as a generalization of Haar wavelets, followed by non-linear operators which aim to extract symmetries and invariances that are later fed in a classification/regression algorithm. We show that good results can be obtained with the proposed method for both kind of tasks. We have outperformed the best available algorithms in 4 out of 18 important data classification problems, and have obtained a more robust performance than ARIMA and ETS time series methods in regression problems for data with strong periodicities.
    ClassificationRegressionArchitectureTime SeriesPattern recognitionHaar waveletArtificial neural networkConvolutional neural networkWavelet transformWavelet...
  • Dimension of an inertial manifold for a chaotic attractor of spatially distributed system is estimated using autoencoder neural network. The inertial manifold is a low dimensional manifold where the chaotic attractor is embedded. The autoencoder maps system state vectors onto themselves letting them pass through an inner state with a reduced dimension. The training processes of the autoencoder is shown to depend dramatically on the reduced dimension: a learning curve saturates when the dimension is too small and decays if it is sufficient for a lossless information transfer. The smallest sufficient value is considered as a dimension of the inertial manifold, and the autoencoder implements a mapping onto the inertial manifold and back. The correctness of the computed dimension is confirmed by its remarkable coincidence with the one obtained as a number of covariant Lyapunov vectors with vanishing pairwise angles. These vectors are called physical modes. Unlike never having zero angles residual ones they are known to span a tangent subspace for the inertial manifold.
    Inertial manifoldAutoencoderAttractorManifoldNeural networkGinzburg LandauLyapunov vectorPhase spaceMachine learningOptimization...
  • Computing equilibrium states in condensed-matter many-body systems, such as solvated proteins, is a long-standing challenge. Lacking methods for generating statistically independent equilibrium samples directly, vast computational effort is invested for simulating these system in small steps, e.g., using Molecular Dynamics. Combining deep learning and statistical mechanics, we here develop Boltzmann Generators, that are shown to generate statistically independent samples of equilibrium states of representative condensed matter systems and complex polymers. Boltzmann Generators use neural networks to learn a coordinate transformation of the complex configurational equilibrium distribution to a distribution that can be easily sampled. Accurate computation of free energy differences, and discovery of new system states are demonstrated, providing a new statistical mechanics tool that performs orders of magnitude faster than standard simulation methods.
    Molecular dynamicsBoltzmann distributionNeural networkStatistical mechanicsPolymersCondensed matter systemDeep learningEntropyProteinMany-body systems...
  • Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely simple alteration of momentum SGD, averaging a plain SGD step with a momentum step. We describe numerous connections to and identities with other algorithms, and we characterize the set of two-state optimization algorithms that QHM can recover. Finally, we propose a QH variant of Adam called QHAdam, and we empirically demonstrate that our algorithms lead to significantly improved training in a variety of settings, including a new state-of-the-art result on WMT16 EN-DE. We hope that these empirical results, combined with the conceptual and practical simplicity of QHM and QHAdam, will spur interest from both practitioners and researchers. Code is immediately available.
    OptimizationDeep learningStatistical estimatorD-termHyperparameterLeast squaresCovarianceEntropySchedulingStochastic gradient descent...
  • We propose a new method for learning from a single demonstration to solve hard exploration tasks like the Atari game Montezuma's Revenge. Instead of imitating human demonstrations, as proposed in other recent works, our approach is to maximize rewards directly. Our agent is trained using off-the-shelf reinforcement learning, but starts every episode by resetting to a state from a demonstration. By starting from such demonstration states, the agent requires much less exploration to learn a game compared to when it starts from the beginning of the game at every episode. We analyze reinforcement learning for tasks with sparse rewards in a simple toy environment, where we show that the run-time of standard RL methods scales exponentially in the number of states between rewards. Our method reduces this to quadratic scaling, opening up many tasks that were previously infeasible. We then apply our method to Montezuma's Revenge, for which we present a trained agent achieving a high-score of 74,500, better than any previously published result.
    Reinforcement learningRecurrent neural networkHyperparameterRoboticsDiamondQ-learningEntropyYouTubeHidden stateConvolutional neural network...
  • Utilizing recently introduced concepts from statistics and quantitative risk management, we present a general variant of Batch Normalization (BN) that offers accelerated convergence of Neural Network training compared to conventional BN. In general, we show that mean and standard deviation are not always the most appropriate choice for the centering and scaling procedure within the BN transformation, particularly if ReLU follows the normalization step. We present a Generalized Batch Normalization (GBN) transformation, which can utilize a variety of alternative deviation measures for scaling and statistics for centering, choices which naturally arise from the theory of generalized deviation measures and risk theory in general. When used in conjunction with the ReLU non-linearity, the underlying risk theory suggests natural, arguably optimal choices for the deviation measure and statistic. Utilizing the suggested deviation measure and statistic, we show experimentally that training is accelerated more so than with conventional BN, often with improved error rate as well. Overall, we propose a more flexible BN transformation supported by a complimentary theoretical framework that can potentially guide design choices.
    StatisticsNeural networkDeep Neural NetworksArchitectureConjunctionOptimizationCVaRDecision problemPortfolioRisk analysis...
  • The Sinkhorn distance, a variant of the Wasserstein distance with entropic regularization, is an increasingly popular tool in machine learning and statistical inference. We give a simple, practical, parallelizable algorithm NYS-SINK, based on Nystr\"om approximation, for computing Sinkhorn distances on a massive scale. As we show in numerical experiments, our algorithm easily computes Sinkhorn distances on data sets hundreds of times larger than can be handled by state-of-the-art approaches. We also give provable guarantees establishing that the running time and memory requirements of our algorithm adapt to the intrinsic dimension of the underlying data.
    RankManifoldEffective dimensionMachine learningOptimal transportRegularizationIntrinsic dimensionStatistical inferenceSingular valueCholesky decomposition...
  • This paper describes the discipline of distance metric learning, a branch of machine learning that aims to learn distances from the data. Distance metric learning can be useful to improve similarity learning algorithms, and also has applications in dimensionality reduction. We describe the distance metric learning problem and analyze its main mathematical foundations. We discuss some of the most popular distance metric learning techniques used in classification, showing their goals and the required information to understand and use them. Furthermore, we present a Python package that collects a set of 17 distance metric learning techniques explained in this paper, with some experiments to evaluate the performance of the different algorithms. Finally, we discuss several possibilities of future work in this topic.
    OptimizationNearest-neighbor siteTraining setClassificationPrincipal component analysisRankSoftwareConvex setEuclidean distanceFeature space...
  • We develop a quantum theory of magnetic skyrmions and antiskyrmions in a spin-1/2 Heisenberg magnet with frustrating next-nearest neighbor interactions. Using exact diagonalization we show numerically that a quantum skyrmion exists as a stable many-magnon bound state and investigate its quantum numbers. We then derive a phenomenological Schr\"odinger equation for the quantum skyrmion and its internal degrees of freedom. We find that quantum skyrmions have highly unusual properties. Their bandwidth is exponentially small and arises from tunneling processes between skyrmion and antiskyrmion. The bandstructure changes both qualitatively and quantitatively when a single spin is added or removed from the quantum skyrmion, reflecting a locking of angular momentum and spin quantum numbers characteristic for skyrmions. Additionally, while for weak forces the quantum skyrmion is accelerated parallel to the force, it moves in a perpendicular direction for stronger fields.
    SkyrmionNearest-neighbor siteDegree of freedomFerromagnetBound stateMagnonQuantum theoryHelicitySpin-orbit interactionSpin quantum number...
  • This book covers the history of probability up to Kolmogorov with essential additional coverage of statistics up to Fisher. Based on my work of ca. 50 years, it is the only suchlike book. Gorrochurn (2016) is similar but his study of events preceding Laplace is absolutely unsatisfactory. Hald (1990; 1998) are worthy indeed but the Continental direction of statistics (Russian and German statisticians) is omitted, it is impossible to find out what was contained in any particular memoir of Laplace and the explanation does not always explain the path from, say, Poisson to a modern interpretation of his results. Finally, the reader ought to master modern math. statistics. I included many barely known facts and conclusions, e. g., Gauss' justification of least squares (yes!), the merits of Bayes (again, yes!), the unforgivable mistake of Laplace, the work of Chebyshev and his students (merits and failures) etc., etc. The book covers an extremely wide field, and is targeted at the same readers as any other book on history of science. Mathematical treatment is not as difficult as it is for readers of Hald.
    StatisticsLeast squaresProbabilityTheoryEventGaussField...
  • We use ten different galaxy formation scenarios from the EAGLE suite of {\Lambda}CDM hydrodynamical simulations to assess the impact of feedback mechanisms in galaxy formation and compare these to observed strong gravitational lenses. To compare observations with simulations, we create strong lenses with $M_\star$ > $10^{11}$ $M_\odot$ with the appropriate resolution and noise level, and model them with an elliptical power-law mass model to constrain their total mass density slope. We also obtain the mass-size relation of the simulated lens-galaxy sample. We find significant variation in the total mass density slope at the Einstein radius and in the projected stellar mass-size relation, mainly due to different implementations of stellar and AGN feedback. We find that for lens selected galaxies, models with either too weak or too strong stellar and/or AGN feedback fail to explain the distribution of observed mass-density slopes, with the counter-intuitive trend that increasing the feedback steepens the mass density slope around the Einstein radius ($\approx$ 3-10 kpc). Models in which stellar feedback becomes inefficient at high gas densities, or weaker AGN feedback with a higher duty cycle, produce strong lenses with total mass density slopes close to isothermal (i.e. -d log({\rho})/d log(r) $\approx$ 2.0) and slope distributions statistically agreeing with observed strong lens galaxies in SLACS and BELLS. Agreement is only slightly worse with the more heterogeneous SL2S lens galaxy sample. Observations of strong-lens selected galaxies thus appear to favor models with relatively weak feedback in massive galaxies.
    Gravitational lens galaxyGalaxy FormationAGN feedbackStrong gravitational lensingEinstein radiusGalaxyBELLS surveyEAGLE simulation projectHydrodynamical simulationsStellar feedback...
  • This article gives an overview of different database encryption choices in SQL Server. Which one works best in which situation. In today's world Data is more crucial than the expensive hardware cost. No one wants their personal data to be comprised. Same for business houses as well and they also do not want their data to be inappropriately handled to go out of the business. To help protect the public rights and safety, recently this year, the European Union had come up with strict rules and regulation of GDPR (General Data Protection Regulation).
    Encryption
  • Convolutional Neural Networks (CNNs) have recently achieved remarkably strong performance on the practically important task of sentence classification (kim 2014, kalchbrenner 2014, johnson 2014). However, these models require practitioners to specify an exact model architecture and set accompanying hyperparameters, including the filter region size, regularization parameters, and so on. It is currently unknown how sensitive model performance is to changes in these configurations for the task of sentence classification. We thus conduct a sensitivity analysis of one-layer CNNs to explore the effect of architecture components on model performance; our aim is to distinguish between important and comparatively inconsequential design decisions for sentence classification. We focus on one-layer CNNs (to the exclusion of more complex models) due to their comparative simplicity and strong empirical performance, which makes it a modern standard baseline method akin to Support Vector Machine (SVMs) and logistic regression. We derive practical advice from our extensive empirical results for those interested in getting the most out of CNNs for sentence classification in real world settings.
    Convolutional neural networkClassificationArchitectureWord vectorsHyperparameterOptimizationSupport vector machineActivation functionRegularizationFeature vector...
  • We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.
    Convolutional neural networkWord vectorsClassificationArchitectureNeural networkDeep learningHyperparameterComputational linguisticsGoogle.comGoogle+...
  • We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.
    ArchitectureRecurrent neural networkActivation functionDirected acyclic graphGraphEntropyLong short term memoryNeural networkRankRegularization...
  • Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.
    RegularizationAutoencoderTraining setGraphStochastic gradient descentOverfittingEarly stoppingGeneralization errorHyperparameterArchitecture...
  • Topology is quickly becoming a cornerstone in our understanding of electronic systems. Like their electronic counterparts, bosonic systems can exhibit a topological band structure, but in real materials it is difficult to ascertain their topological nature, as their ground state is a simple condensate or the vacuum, and one has to rely instead on excited states, for example a characteristic thermal Hall response. Here we propose driving a topological magnon insulator with an electromagnetic field and show that this causes edge mode instabilities and a large non-equilibrium steady-state magnon edge current. Building on this, we discuss several experimental signatures that unambiguously establish the presence of topological magnon edge modes. Furthermore, our amplification mechanism can be employed to power a topological travelling-wave magnon amplifier and topological magnon laser, with applications in magnon spintronics. This work thus represents a step toward functional topological magnetic materials.
    MagnonHamiltonianInstabilityElectronic band structureSteady stateDzyaloshinskii-Moriya interactionHall effectUnit cellSpintronicsNon-equilibrium steady states...
  • The large $N$ expansion of giant graviton correlators is considered. Giant gravitons are described using operators with a bare dimension of order $N$. In this case the usual $1/N$ expansion is not applicable and there are contributions to the correlator that are non-perturbative in character. By writing the (square of the) correlators in terms of the hypergeometric function ${}_2F_1(a,b;c;1)$, we are able to rephrase the $1/N$ expansion of the correlator as a semi-classical expansion for a Schr\"odinger equation. In this way we are able to argue that the $1/N$ expansion of the correlator is Borel summable and that it exhibits a parametric Stokes phenomenon as the angular momentum of the giant graviton is varied.
    GravitonWentzel-Kramers-Brillouin approximationGraphSchur polynomialTwo-point correlation functionHypergeometric functionInstantonResurgenceConformal field theoryString theory...
  • Recent observations at high spatial resolution have shown that magnetic flux cancellation occurs on the solar surface much more frequently than previously thought, and so this led Priest et al 2018 to propose magnetic reconnection driven by photospheric flux cancellation as a mechanism for chromospheric and coronal heating. In particular, they estimated analytically the amount of energy released as heat and the height of the energy release during flux cancellation. In the present work, we take the next step in the theory by setting up a two-dimensional resistive MHD simulation of two canceling polarities in the presence of a horizontal external field and a stratified atmosphere in order to check and improve upon the analytical estimates. Computational evaluation of the energy release during reconnection is found to be in good qualitative agreement with the analytical estimates. In addition, we go further and undertake an initial study of the atmospheric response to reconnection. We find that, during the cancellation, either hot ejections or cool ones or a combination of both hot and cool ejections can be formed, depending on the height of the reconnection location. The hot structures can have the density and temperature of coronal loops, while the cooler structures are suggestive of surges and large spicules.
    PhotosphereCoolingCoronal loopMagnetic energyCoronal heatingCoronaChromosphereMagnetic reconnectionNanoflaresMach number...
  • Background: The stochastic behavior of patient arrival at an emergency department (ED) complicates the management of an ED. More than 50% of hospitals ED capacity tends to operate beyond its normal capacity and eventually fails to deliver high-quality care. To address the concern of stochastics ED arrivals, many types of research has been done using yearly, monthly and weekly time series forecasting. Aim: Our research team believes that hourly time-series forecasting of the load can improve ED management by predicting the arrivals of future patients, and thus, can support strategic decisions in terms of quality enhancement. Methods: Our research does not involve any human subject, only ED admission data from January 2014 to August 2017 retrieved from the UnityPoint Health database. Autoregressive integrated moving average (ARIMA), Holt Winters, TBATS, and neural network methods were implemented to forecast hourly ED patient arrival. Findings: ARIMA (3,0,0) (2,1,0) was selected as the best fit model with minimum Akaike information criterion and Schwartz Bayesian criterion. The model was stationary and qualified the Box Ljung correlation test and the Jarque Bera test for normality. The mean error (ME) and root mean square error (RMSE) were selected as performance measures. An ME of 1.001 and an RMSE of 1.55 was obtained. Conclusions: ARIMA can be used to provide hourly forecasts for ED arrivals and can be utilized as a decision support system in the healthcare industry. Application: This technique can be implemented in hospitals worldwide to predict ED patient arrival.
    Time SeriesData warehouse systemAkaike information criterionBayesianJarque-Bera testMean squared errorNeural networkTeams...