Recently bookmarked papers

with concepts:
  • Python is rapidly becoming the lingua franca of machine learning and scientific computing. With the broad use of frameworks such as Numpy, SciPy, and TensorFlow, scientific computing and machine learning are seeing a productivity boost on systems without a requisite loss in performance. While high-performance libraries often provide adequate performance within a node, distributed computing is required to scale Python across nodes and make it genuinely competitive in large-scale high-performance computing. Many frameworks, such as Charm4Py, DaCe, Dask, Legate Numpy, mpi4py, and Ray, scale Python across nodes. However, little is known about these frameworks' relative strengths and weaknesses, leaving practitioners and scientists without enough information about which frameworks are suitable for their requirements. In this paper, we seek to narrow this knowledge gap by studying the relative performance of two such frameworks: Charm4Py and mpi4py. We perform a comparative performance analysis of Charm4Py and mpi4py using CPU and GPU-based microbenchmarks other representative mini-apps for scientific computing.
    PythonProgrammingSoftwareHigh Performance ComputingParticle-in-cellMachine learningElectroweak scaleOptimizationMultidimensional ArraySciPy...
  • Today's scientific simulations require a significant reduction of data volume because of extremely large amounts of data they produce and the limited I/O bandwidth and storage space. Error-bounded lossy compressor has been considered one of the most effective solutions to the above problem. In practice, however, the best-fit compression method often needs to be customized/optimized in particular because of diverse characteristics in different datasets and various user requirements on the compression quality and performance. In this paper, we develop a novel modular, composable compression framework (namely SZ3), which involves three significant contributions. (1) SZ3 features a modular abstraction for the prediction-based compression framework such that the new compression modules can be plugged in easily. (2) SZ3 supports multialgorithm predictors and can automatically select the best-fit predictor for each data block based on the designed error estimation criterion. (3) SZ3 allows users to easily compose different compression pipelines on demand, such that both compression quality and performance can be significantly improved for their specific datasets and requirements. (4) In addition, we evaluate several lossy compressors composed from SZ3 using the real-world datasets. Specifically, we leverage SZ3 to improve the compression quality and performance for different use-cases, including GAMESS quantum chemistry dataset and Advanced Photon Source (APS) instrument dataset. Experiments show that our customized compression pipelines lead to up to 20% improvement in compression ratios under the same data distortion compared with the state-of-the-art approaches.
    Quantum chemistrySimulationsPhoton
  • Physics-informed neural networks (PINNs) are an increasingly powerful way to solve partial differential equations, generate digital twins, and create neural surrogates of physical models. In this manuscript we detail the inner workings of NeuralPDE.jl and show how a formulation structured around numerical quadrature gives rise to new loss functions which allow for adaptivity towards bounded error tolerances. We describe the various ways one can use the tool, detailing mathematical techniques like using extended loss functions for parameter estimation and operator discovery, to help potential users adopt these PINN-based techniques into their workflow. We showcase how NeuralPDE uses a purely symbolic formulation so that all of the underlying training code is generated from an abstract formulation, and show how to make use of GPUs and solve systems of PDEs. Afterwards we give a detailed performance analysis which showcases the trade-off between training techniques on a large set of PDEs. We end by focusing on a complex multiphysics example, the Doyle-Fuller-Newman (DFN) Model, and showcase how this PDE can be formulated and solved with NeuralPDE. Together this manuscript is meant to be a detailed and approachable technical report to help potential users of the technique quickly get a sense of the real-world performance trade-offs and use cases of the PINN techniques.
    Neural networkDigital TwinQuadraturePartial differential equationPotential...
  • Standard library implementations of functions like sin and exp optimize for accuracy, not speed, because they are intended for general-purpose use. But applications tolerate inaccuracy from cancellation, rounding error, and singularities-sometimes even very high error-and many application could tolerate error in function implementations as well. This raises an intriguing possibility: speeding up numerical code by tuning standard function implementations. This paper thus introduces OpTuner, an automatic method for selecting the best implementation of mathematical functions at each use site. OpTuner assembles dozens of implementations for the standard mathematical functions from across the speed-accuracy spectrum. OpTuner then uses error Taylor series and integer linear programming to compute optimal assignments of function implementation to use site and presents the user with a speed-accuracy Pareto curve they can use to speed up their code. In a case study on the POV-Ray ray tracer, OpTuner speeds up a critical computation, leading to a whole program speedup of 9% with no change in the program output (whereas human efforts result in slower code and lower-quality output). On a broader study of 37 standard benchmarks, OpTuner matches 216 implementations to 89 use sites and demonstrates speed-ups of 107% for negligible decreases in accuracy and of up to 438% for error-tolerant applications.
    ProgrammingLinear optimizationTaylor series
  • We introduce Merlion, an open-source machine learning library for time series. It features a unified interface for many commonly used models and datasets for anomaly detection and forecasting on both univariate and multivariate time series, along with standard pre/post-processing layers. It has several modules to improve ease-of-use, including visualization, anomaly score calibration to improve interpetability, AutoML for hyperparameter tuning and model selection, and model ensembling. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production. This library aims to provide engineers and researchers a one-stop solution to rapidly develop models for their specific time series needs and benchmark them across multiple time series datasets. In this technical report, we highlight Merlion's architecture and major functionalities, and we report benchmark numbers across different baseline models and ensembles.
    Time SeriesAnomaly detectionHyperparameterMachine learningModel selectionDeep learningOpen sourceCalibrationApplication programming interfaceF1 score...
  • In this paper we describe the research and development activities in the Center for Efficient Exascale Discretization within the US Exascale Computing Project, targeting state-of-the-art high-order finite-element algorithms for high-order applications on GPU-accelerated platforms. We discuss the GPU developments in several components of the CEED software stack, including the libCEED, MAGMA, MFEM, libParanumal, and Nek projects. We report performance and capability improvements in several CEED-enabled applications on both NVIDIA and AMD GPU systems.
    DiscretizationSoftwareArchitectureProgrammingQuadratureOptimizationHigh Performance ComputingEngineeringResearch and DevelopmentLarge eddy simulation...
  • We present the software design of Gridap, a novel finite element library written exclusively in the Julia programming language, which is being used by several research groups world-wide to simulate complex physical phenomena such as magnetohydrodynamics, photonics, weather modeling, non-linear solid mechanics, and fluid-structure interaction problems. The library provides a feature-rich set of discretization techniques for the numerical approximation of a wide range of PDEs, including linear, nonlinear, single-field, and multi-field equations. An expressive API allows users to define PDEs in weak form by a syntax close to the mathematical notation. While this is also available in previous codes, the main novelty of Gridap is that it implements this API without introducing a DSL plus a compiler of variational forms. Instead, it leverages the Julia just-in-time compiler to build efficient code, specialized for the concrete problem at hand. As a result, there is no need to use different languages for the computational back-end and the user front-end anymore, thus eliminating the so-called two-language problem. Gridap also provides a low-level API that is modular and extensible via the multiple-dispatch paradigm of Julia and provides easy access to the main building blocks of the library. The main contribution of this paper is the detailed presentation of the novel software abstractions behind the Gridap design that leverages the new software possibilities provided by the Julia language. The second main contribution of the article is a performance comparison against FEniCS. We measure CPU times needed to assemble discrete systems of linear equations for different problem types and show that the performance of Gridap is comparable to FEniCS, demonstrating that the new software design does not compromise performance. Gridap is freely available at Github and distributed under an MIT license.
    Multidimensional ArrayApplication programming interfaceSoftwareCachingPythonQuadratureProgramming LanguageDiscretizationSparsityOptimization...
  • Machine learning and neural network models in particular have been improving the state of the art performance on many artificial intelligence related tasks. Neural network models are typically implemented using frameworks that perform gradient based optimization methods to fit a model to a dataset. These frameworks use a technique of calculating derivatives called automatic differentiation (AD) which removes the burden of performing derivative calculations from the model designer. In this report we describe AD, its motivations, and different implementation approaches. We briefly describe dataflow programming as it relates to AD. Lastly, we present example programs that are implemented with Tensorflow and PyTorch, which are two commonly used AD frameworks.
    ProgrammingGraphMachine learningNetwork modelNeural networkOptimizationPythonDirect edgeArithmeticProgramming Language...
  • Through mathematical models, it is possible to turn a problem of the physical domain into the computational domain. In this context, the paper presents a two-dimensional mesh generator in generalized coordinates, which uses the Parametric Linear Spline method and partial differential equations. The generator is automated and able to treat real complex domains. The code was implemented in Python, applying the Numpy and Matplotlib libraries to matrix manipulations and graphical plots, respectively. Applications are made for monoblock meshes (two-dimensional shape of a bottle) and multi-block meshes (geometry of Igap\'o I lake, Londrina, Paran\'a, Brazil).
    Generalized coordinatesPythonPartial differential equationDiscretizationProgramming LanguageSoftwareProgrammingMagnetic adiabatic collimationParallel AlgorithmCalibration...
  • The original "Seven Motifs" set forth a roadmap of essential methods for the field of scientific computing, where a motif is an algorithmic method that captures a pattern of computation and data movement. We present the "Nine Motifs of Simulation Intelligence", a roadmap for the development and integration of the essential algorithms necessary for a merger of scientific computing, scientific simulation, and artificial intelligence. We call this merger simulation intelligence (SI), for short. We argue the motifs of simulation intelligence are interconnected and interdependent, much like the components within the layers of an operating system. Using this metaphor, we explore the nature of each layer of the simulation intelligence operating system stack (SI-stack) and the motifs therein: (1) Multi-physics and multi-scale modeling; (2) Surrogate modeling and emulation; (3) Simulation-based inference; (4) Causal modeling and inference; (5) Agent-based modeling; (6) Probabilistic programming; (7) Differentiable programming; (8) Open-ended optimization; (9) Machine programming. We believe coordinated efforts between motifs offers immense opportunity to accelerate scientific discovery, from solving inverse problems in synthetic biology and climate science, to directing nuclear energy experiments and predicting emergent behavior in socioeconomic settings. We elaborate on each layer of the SI-stack, detailing the state-of-art methods, presenting examples to highlight challenges and opportunities, and advocating for specific ways to advance the motifs and the synergies from their combinations. Advancing and integrating these technologies can enable a robust and efficient hypothesis-simulation-analysis type of scientific method, which we introduce with several use-cases for human-machine teaming and automated science.
    ProgrammingInferenceOptimizationMachine learningIntelligenceNeural networkProbabilistic programming languageSoftwareDeep learningEngineering...
  • We investigate and quantify the impact of mixed (cold and warm) dark matter models on large-scale structure observables. In this scenario, dark matter comes in two phases, a cold one (CDM) and a warm one (WDM): the presence of the latter causes a suppression in the matter power spectrum which is allowed by current constraints and may be detected in present-day and upcoming surveys. We run a large set of $N$-body simulations in order to build an efficient and accurate emulator to predict the aforementioned suppression with percent precision over a wide range of values for the WDM mass, $M_\mathrm{wdm}$, and its fraction with respect to the totality of dark matter, $f_\mathrm{wdm}$. The suppression in the matter power spectrum is found to be independent of changes in the cosmological parameters at the 2% level for $k\lesssim 10 \ h/$Mpc and $z\leq 3.5$. In the same ranges, by applying a baryonification procedure on both $\Lambda$CDM and CWDM simulations to account for the effect of feedback, we find a similar level of agreement between the two scenarios. We examine the impact that such suppression has on weak lensing and angular galaxy clustering power spectra. Finally, we discuss the impact of mixed dark matter on the shape of the halo mass function and which analytical prescription yields the best agreement with simulations. We provide the reader with an application to galaxy cluster number counts.
    Cold plus warm dark matterWarm dark matterMatter power spectrumHalo mass functionGalaxy clusteringFree streaming of particlesDark matterWeak lensingWDM particle massCosmological parameters...
  • We carry out a test of the cosmic distance duality relation using a sample of 52 SPT-SZ clusters, along with X-ray measurements from XMM-Newton. To carry out this test, we need an estimate of the luminosity distance ($D_L$) at the redshift of the cluster. For this purpose, we use three independent methods: directly using $D_L$ from the closest Type Ia Supernovae from the Union 2.1 sample, non-parametric reconstruction of $D_L$ using the same Union 2.1 sample, and finally using $H(z)$ measurements from cosmic chronometers and reconstructing $D_L$ using Gaussian Process regression. We use four different functions to characterize the deviations from CDDR. All our results for these ($4 \times 3$) analyses are consistent with CDDR to within 1$\sigma$.
    Cluster of galaxiesSupernova Type IaAngular diameter distanceLuminosity distanceDualityXMM-NewtonGaussian Process RegressionPressure profileCluster samplingCosmology...
  • Using the state-of-the-art suite of hydrodynamic simulations Simba, as well as its dark-matter-only counterpart, we study the impact of the presence of baryons and of different stellar/AGN feedback mechanisms on large-scale structure, halo density profiles, and on the abundance of different baryonic phases within halos and in the intergalactic medium (IGM). The unified picture that emerges from our analysis is that the main physical drivers shaping the distribution of matter at all scales are star formation-driven galactic outflows at $z>2$ for lower mass halos and AGN jets at $z<2$ in higher mass halos. Feedback suppresses the baryon mass function with time relative to the halo mass function, and it even impacts the halo mass function itself at the ~20% level, particularly evacuating the centres and enhancing dark matter just outside halos. At early epochs baryons pile up in the centres of halos, but by late epochs and particularly in massive systems gas has mostly been evacuated from within the inner halo. AGN jets are so efficient at such evacuation that at low redshifts the baryon fraction within $\sim 10^{12}-10^{13} \, \rm M_{\odot}$ halos is only 25% of the cosmic baryon fraction, mostly in stars. The baryon fraction enclosed in a sphere around such halos approaches the cosmic value $\Omega_{\rm b}/\Omega_{\rm m}$ only at 10-20 virial radii. As a result, 87% of the baryonic mass in the Universe lies in the IGM at $z=0$, with 67% being in the form of warm-hot IGM ($T>10^5 \, \rm K$).
    Halo mass functionVirial massCircumgalactic mediumAGN jetsActive Galactic NucleiIntergalactic mediumInterstellar mediumCoolingStarStar formation...
  • A longstanding question in quantum gravity regards the localization of quantum information; one way to formulate this question is to ask how subsystems can be defined in quantum-gravitational systems. The gauge symmetry and necessity of solving the constraints appear to imply that the answers to this question here are different than in finite quantum systems, or in local quantum field theory. Specifically, the constraints can be solved by providing a "gravitational dressing" for the underlying field-theory operators, but this modifies their locality properties. It has been argued that holography itself may be explained through this role of the gauge symmetry and constraints, at the nonperturbative level, but there are also subtleties in constructing a holographic map in this approach. There are also claims that holography is implied even by perturbative solution of the constraints. This short note provides further examination of these questions, and in particular investigates to what extent perturbative or nonperturbative solution of the constraints implies that information naively thought to be localized can be recovered by asymptotic measurements, and the relevance of this in defining subsystems. In the leading perturbative case, the relevant effects are seen to be exponentially suppressed. These questions are, for example, important in sharply characterizing the unitarity problem for black holes.
    Quantum gravityBlack holeHolographic principleGauge symmetryUnitarityExpectation ValueAnti de Sitter spaceQuantum informationField theoryLocal quantum field theory...
  • In recent years, there have been two independent but related developments in the study of irrelevant deformations in two dimensional quantum field theories (QFTs). The first development is the deformation of a two dimensional QFT by the determinant of the energy momentum stress tensor, commonly referred to as $T{\bar T}$ deformation. The second development is in two dimensional holographic field theories which are dual to string theory in asymptotically Anti-de Sitter (AdS) spacetimes. In this latter development, the deformation is commonly referred to as single-trace $T{\bar T}$ deformation. The single-trace $T{\bar T}$ deformation corresponds in the bulk to a string background that interpolates between AdS spacetime in the infrared (IR) and a linear dilaton spacetime (vacuum of little string theory (LST)) in the ultraviolet (UV). It serves as a useful tool and guide to better understand and explore holography in asymptotically AdS and non-AdS spacetimes in a controlled setting. In particular, it is useful to gain insights into holography in flat spacetimes. The dissertation is devoted to the study of single-trace $T{\bar T}$ deformation and its single-trace generalizations in theories with $U(1)$ currents, namely $J\bar T$ and $T\bar J$ deformations, in the context of gauge/gravity duality. In the dissertation I present new results in the study of holography in asymptotically non-AdS spacetimes. I discuss two point correlation functions in single-trace $T{\bar T}$ deformation, and entanglement entropy and entropic $c$-function in single-trace $T{\bar T}$, $J\bar T$ and $T\bar J$ deformations.
    Quantum field theoryConformal field theoryEntanglement entropyString theoryField theoryLong stringsInfrared limitRenormalization groupHolographic principleVertex operator...
  • The mass function of globular cluster (GC) populations is a fundamental observable that encodes the physical conditions under which these massive stellar clusters formed and evolved. The high-mass end of star cluster mass functions are commonly described using a Schechter function, with an exponential truncation mass $M_{c,*}$. For the GC mass functions in the Virgo galaxy cluster, this truncation mass increases with galaxy mass ($M_{*}$). In this paper we fit Schechter mass functions to the GCs in the most massive galaxy group ($M_{\mathrm{200}} = 5.14 \times 10^{13} M_{\odot}$) in the E-MOSAICS simulations. The fiducial cluster formation model in E-MOSAICS reproduces the observed trend of $M_{c,*}$ with $M_{*}$ for the Virgo cluster. We therefore examine the origin of the relation by fitting $M_{c,*}$ as a function of galaxy mass, with and without accounting for mass loss by two-body relaxation, tidal shocks and/or dynamical friction. In the absence of these mass-loss mechanisms, the $M_{c,*}$-$M_{*}$ relation is flat above $M_* > 10^{10} M_{\odot}$. It is therefore the disruption of high-mass GCs in galaxies with $M_{*}\sim 10^{10} M_{\odot}$ that lowers the $M_{c,*}$ in these galaxies. High-mass GCs are able to survive in more massive galaxies, since there are more mergers to facilitate their redistribution to less-dense environments. The $M_{c,*}-M_*$ relation is therefore a consequence of both the formation conditions of massive star clusters and their environmentally-dependent disruption mechanisms.
    Globular clusterGalaxyGalaxy massMass functionDynamical frictionHigh massStellar massMilky WayMassive galaxiesSchechter function...
  • The most spiral galaxies have a flat rotational velocity curve, according to the different observational techniques used in several wavelengths domain. In this work, we show that non-linear terms are able to balance the dispersive effect of the wave, thus reviving the observed rotational curve profiles without inclusion of any other but baryonic matter concentrated in the bulge and disk. In order to prove that the considered model is able to restore a flat rotational curve, Milky Way has been chosen as the best mapped galaxy to apply on. Using the gravitational N-body simulations with up to $10^7$ particles, we test this dynamical model in the case of the Milky Way with two different approaches. Within the direct approach, as an input condition in the simulation runs we set the spiral surface density distribution which is previously obtained as an explicit solution to non-linear Schr\"{o}dinger equation (instead of a widely used exponential disk approximation). In the evolutionary approach, we initialize the runs with different initial mass and rotational velocity distributions, in order to capture the natural formation of spiral arms, and to determine their role in the disk evolution. In both cases we are able to reproduce the stable and non-expanding disk structures at the simulation end times of $\sim10^9$ years, with no halo inclusion. Although the given model doesn't take into account the velocity dispersion of stars and finite disk thickness, the results presented here still imply that non-linear effects can significantly alter the amount of dark matter which is required to keep the galactic disk in stable configuration.
    SolitonMilky WayGalaxySpiral galaxyStarN-body simulationGalactic disksDark matterSpiral armSpiral structure...
  • The dynamics of the intracluster medium (ICM) is affected by turbulence driven by several processes, such as mergers, accretion and feedback from active galactic nuclei. X-ray surface brightness fluctuations have been used to constrain turbulence in galaxy clusters. Here, we use simulations to further investigate the relation between gas density and turbulent velocity fluctuations, with a focus on the effect of the stratification of the ICM. In this work, we studied the turbulence driven by hierarchical accretion by analysing a sample of galaxy clusters simulated with the cosmological code ENZO. We used a fixed scale filtering approach to disentangle laminar from turbulent flows. In dynamically perturbed galaxy clusters, we found a relation between the root mean square of density and velocity fluctuations, albeit with a different slope than previously reported. The Richardson number is a parameter that represents the ratio between turbulence and buoyancy, and we found that this variable has a strong dependence on the filtering scale. However, we could not detect any strong relation between the Richardson number and the logarithmic density fluctuations, in contrast to results by recent and more idealised simulations. In particular, we find a strong effect from radial accretion, which appears to be the main driver for the gas fluctuations. The ubiquitous radial bias in the dynamics of the ICM suggests that homogeneity and isotropy are not always valid assumptions, even if the turbulent spectra follow Kolmogorov's scaling. Finally, we find that the slope of the velocity and density spectra are independent of cluster-centric radii.
    Intra-cluster mediumTurbulenceCluster of galaxiesRichardson numberVelocity fluctuationsAccretionBuoyancyCluster samplingSurface brightnessStratification...
  • In 1942 Haskell B.Curry presented what is now called Curry paradox which can be found in a logic independently of its stand on negation.In recent years there has been a revitalised interest in non-classical solutions to the semantic paradoxes. In this article the non-classical resolution of Curry's Paradox and Shaw-Kwei paradox without rejection any contraction postulate is proposed.
    ParadoxismCurry's paradoxConjunctionSet theoryLanguageUniverseMaterialsForce...
  • Murchikova et al 2019 discovered a disk of cool ionized gas within 20,000 Schwarzschild radii of the Milky Way's Galactic Center black hole Sagittarius A*. They further demonstrated that the ionizing photon flux in the region is enough to keep the disk ionized, but there is not ample excess of this radiation. This raised the possibility that some neutral gas could also be in the region shielded within the cool ionized clumps. Here we present ALMA observations of a broad 1.3 millimeter hydrogen recombination line H30alpha: n = 31 -> 30, conducted during the flyby of the S0-2 star by Sgr A*. We report that the velocity-integrated H30alpha line flux two month prior to the S0-2 pericenter passage is about 20% larger than it was one month prior to the passage. The S0-2 is a strong source of ionizing radiation moving at several thousand kilometers per second during the approach. Such a source is capable of ionising parcels of neural gas along its trajectory, resulting in variation of the recombination line spectra from epoch to epoch. We conclude that there are at least (6.6 +- 3.3) x 10^{-6} Msun of neutral gas within 20,000 Schwarzschild radii of Sgr A*.
    CoolingSagittarius A*StarIonizing radiationRecombinationAtacama Large Millimeter ArrayPoint sourceMessier 19Black holeCalibration...
  • We present results from our analysis of the Hydra I cluster observed in neutral atomic hydrogen (HI) as part of the Widefield ASKAP L-band Legacy All-sky Blind Survey (WALLABY). These WALLABY observations cover a 60-square-degree field of view with uniform sensitivity and a spatial resolution of 30 arcsec. We use these wide-field observations to investigate the effect of galaxy environment on HI gas removal and star formation quenching by comparing the properties of cluster, infall and field galaxies extending up to $\sim5R_{200}$ from the cluster centre. We find a sharp decrease in the HI-detected fraction of infalling galaxies at a projected distance of $\sim1.5R_{200}$ from the cluster centre from $\sim0.85\%$ to $\sim0.35\%$. We see evidence for the environment removing gas from the outskirts of HI-detected cluster and infall galaxies through the decrease in the HI to $r$-band optical disc diameter ratio. These galaxies lie on the star forming main sequence, indicating that gas removal is not yet affecting the inner star-forming discs and is limited to the galaxy outskirts. Although we do not detect galaxies undergoing galaxy-wide quenching, we do observe a reduction in recent star formation in the outer disc of cluster galaxies, which is likely due to the smaller gas reservoirs present beyond the optical radius in these galaxies. Stacking of HI non-detections with HI masses below $M_{\rm{HI}}\lesssim10^{8.4}\,\rm{M}_{\odot}$ will be required to probe the HI of galaxies undergoing quenching at distances $\gtrsim60$ Mpc with WALLABY.
    GalaxyStar formationMain sequence starStellar massMilky WayQuenchingAustralian SKA PathfinderWide-field Infrared Survey ExplorerHealth informaticsStar formation rate...
  • We introduce Hausdorff-Colombeau measure in respect with negative fractal dimensions. Axiomatic quantum field theory in spacetime with negative fractal dimensions is proposed.Spacetime is modelled as a multifractal subset of $R^{4}$ with positive and negative fractal dimensions.The cosmological constant problem arises because the magnitude of vacuum energy density predicted by quantum field theory is about 120 orders of magnitude larger than the value implied by cosmological observations of accelerating cosmic expansion. We pointed out that the fractal nature of the quantum space-time with negative Hausdorff-Colombeau dimensions can resolve this tension. The canonical Quantum Field Theory is widely believed to break down at some fundamental high-energy cutoff $E$ and therefore the quantum fluctuations in the vacuum can be treated classically seriously only up to this high-energy cutoff. In this paper we argue that Quantum Field Theory in fractal space-time with negative Hausdorff-Colombeau dimensions gives high-energy cutoff on natural way. In order to obtain disered physical result we apply the canonical Pauli-Villars regularization up to $E$. It means that there exist the ghost-driven acceleration of the univers hidden in cosmological constant.
    FractalQuantum field theoryCosmological constant problemFractal dimensionExpansion of the UniverseCosmological observationVacuum energyPauli-Villars regularizationAxiomatic quantum field theoryCosmological constant...
  • In this paper possible completion $^*R_{d}$ of the Robinson non-archimedean field $^*R$ constructed by Dedekind sections. Given an class of analytic functions of one complex variable $f \in C[[z]]$,we investigate the arithmetic nature of the values of $f$ at transcendental points $e^{n}$. Main results are: 1) the both numbers $e+\pi$ and $e\pi$ are irrational, 2) number $e^{e}$ is transcendental. Nontrivial generalization of the Lindemann-Weierstrass theorem is obtained
    ArithmeticLindemann-Weierstrass theoremFieldAnalytic function...
  • In this article we proved so-called strong reflection principles corresponding to formal theories Th which has omega-models. An posible generalization of the Lob's theorem is considered.Main results is: (1) let $k$ be an inaccessible cardinal, then $\neg Con(ZFC+\exists k)$,(2) there is a Lindel\"of $T_3$ indestructible space of pseudocharacter $\leqslant \aleph_1$ and size $\aleph_2$ in $L$.
    Standard ModelLöb's theoremTopologyTheory...
  • In this paper we leave the neighborhood of the singularity at the origin and turn to the singularity at the horizon. Using nonlinear superdistributional geometry and supergeneralized functions it seems possible to show that the horizon singularity is not only a coordinate singularity without leaving Schwarzschild coordinates. However the Tolman formula for the total energy $E$ of a static and asymptotically flat spacetime,gives $E=mc^2$, as it should be. New class Colombeau solutions to Einstein field equations is obtained.New class Colombeau solutions to Einstein field equations is obtained. The vacuum energy density of free scalar quantum field ${\Phi}$ with a distributional background spacetime also is considered.It has been widely believed that, except in very extreme situations, the influence of acceleration on quantum fields should amount to just small, sub-dominant contributions. Here we argue that this belief is wrong by showing that in a Rindler distributional background spacetime with distributional Levi-Civit\`a connection the vacuum energy of free quantum fields is forced, by the very same background distributional space-time such a Rindler distributional background space-time, to become dominant over any classical energy density component.This semiclassical gravity effect finds its roots in the singular behavior of quantum fields on a Rindler distributional space-times with distributional Levi-Civit\`a connection. In particular we obtain that the vacuum fluctuations $<{\Phi}^2({\delta})>$ have a singular behavior at a Rindler horizon $\delta = 0$.Therefore sufficiently strongly accelerated observer burns up near the Rindler horizon. Thus Polchinski account does not violate of the Einstein equivalence principle.
    HorizonRegularizationVacuum energyLuminosity functionEinstein field equationsEinstein equivalence principleSemiclassical gravityFieldGeometryVacuum...
  • FAIR principles have the intent to act as a guideline for those wishing to enhance the reusability of their data holdings and put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. Interoperability, one core of these principles, especially when dealing with automated systems' ability to interface with each other, requires open standards to avoid restrictions that negatively impact the user's experience. Open-ness of standards is best supported when the governance itself is open and includes a wide range of community participation. In this contribution we report our experience with the FAIR principles, interoperable systems and open governance in astrophysics. We report on activities that have matured within the ESCAPE project with a focus on interfacing the EOSC architecture and Interoperability Framework.
    FAIR PrinciplesVirtual observatoryArchitectureSoftwareOpen sourceMetadata schemaKeyphraseRadio astronomyEcosystemsHorizon...
  • Astronomy is entering an era of data-driven discovery, due in part to modern machine learning (ML) techniques enabling powerful new ways to interpret observations. This shift in our scientific approach requires us to consider whether we can trust the black box. Here, we overview methods for an often-overlooked step in the development of ML models: building community trust in the algorithms. Trust is an essential ingredient not just for creating more robust data analysis techniques, but also for building confidence within the astronomy community to embrace machine learning methods and results.
    Machine learningOverfittingClassifierTraining setConvolution Neural NetworkTraining ImageAstronomical dataComputational modellingAstronomyAlgorithms...
  • In the coming decade, a new generation of massively multiplexed spectroscopic surveys, such as PFS, WAVES, and MOONS, will probe galaxies in the distant universe in vastly greater numbers than was previously possible. In this work, we generate mock catalogs for each of these three planned surveys to help quantify and optimize their scientific output. To assign photometry into the UniverseMachine empirical model, we develop the Calibrating Light: Illuminating Mocks By Empirical Relations (CLIMBER) procedure using UltraVISTA photometry. Using the published empirical selection functions for each aforementioned survey, we quantify the mass completeness of each survey. We compare different targeting strategies by varying the area and targeting completeness, and quantify how these survey parameters affect the uncertainty of the two-point correlation function. We demonstrate that the PFS and MOONS measurements will be primarily dominated by cosmic variance, not shot noise, motivating the need for increasingly large survey areas. On the other hand, the WAVES survey, which covers a much larger area, will strike a good balance between cosmic variance and shot noise. For a fixed number of targets, a 5% increased survey area (and $\sim$5% decreased completeness) would decrease the uncertainty of the correlation function at intermediate scales by 0.15%, 1.2%, and 1.1% for our WAVES, PFS, and MOONS samples, respectively. Meanwhile, for a fixed survey area, 5% increased targeting completeness improves the same constraints by 0.7%, 0.25%, and 0.1%. All of the utilities used to construct our mock catalogs and many of the catalogs themselves are publicly available.
    GalaxyTwo-point correlation functionHalo Occupation DistributionStellar massCompletenessGalactic haloStar formation rateMilky WayCosmic varianceMass to light ratio...
  • Using the galaxy clusters from The Three Hundred Project, we define a new parameter: $\lambda_{DS}$ to describe the dynamical state of clusters, which assumes a double-Gaussian distribution in logarithm scale for our mass-complete cluster sample at $z=0$ from the dark-matter-only (DMO) run. Therefore, the threshold for distinguishing relaxed and unrelaxed clusters is naturally determined by the crossing point of the double-Gaussian fitting which has a value of $\lambda_{DS} = 3.424$. By applying $\lambda_{DS}$ with the same parameters from the DMO run to the hydro-dynamically simulated clusters (Gadget-X run and GIZMO-SIMBA run), we investigate the effect of baryons on the cluster dynamical state. We find a weak baryon-model dependence for the $\lambda_{DS}$ parameter. Finally, we study the evolution of $\lambda_{DS}$ along with clusters mass accretion history. We notice an upper limit of halo mass change $\frac{\Delta M_{200}}{M_{200}} \sim 0.12$ that don't alter the cluster dynamical state, i.e. from relaxed to unrelaxed. We define relaxation period (from the most relaxed state to disturb and relaxed again) which reflects how long the dynamical state of a cluster restores its relaxation state, and propose a correlation between this relaxation period and the strength of halo mass change $\frac{\Delta M_{200}}{M_{200}}$. With the proposed fitting to such correlation, we verify the relaxation period can be estimated from $\frac{\Delta M_{200}}{M_{200}}$ (including multi mass change peaks) with considerably small error.
    Cluster of galaxiesRelaxationMilky WayRelaxation timeDynamical timeIntra-cluster mediumAccretionVirial massGalaxyN-body simulation...
  • We characterize homotopical equivalences between causal DAG models, exploiting the close connections between partially ordered set representations of DAGs (posets) and finite Alexandroff topologies. Alexandroff spaces yield a directional topological space: the topology is defined by a unique minimal basis defined by an open set for each variable x, specified as the intersection of all open sets containing x. Alexandroff spaces induce a (reflexive, transitive) preorder. Alexandroff spaces satisfying the Kolmogorov T0 separation criterion, where open sets distinguish variables, converts the preordering into a partial ordering. Our approach broadly is to construct a topological representation of posets from data, and then use the poset representation to build a conventional DAG causal model. We illustrate our framework by showing how it unifies disparate algorithms and case studies proposed previously. Topology plays two key roles in causal discovery. First, topological separability constraints on datasets have been used in several previous approaches to infer causal structure from observations and interventions. Second, a diverse range ofgraphical models used to represent causal structures can be represented in a unified way in terms of a topological representation of the induced poset structure. We show that the homotopy theory of Alexandroff spaces can be exploited to significantly efficiently reduce the number of possible DAG structures, reducing the search space by several orders of magnitude.
    Partially ordered setGraphical modelGeneGraphAlgebraic topologyLatent variableConditional IndependenceVaccineMutationMachine learning...
  • We complete the study of characters on higher rank semisimple lattices initiated in [BH19,BBHP20], the missing case being the case of lattices in higher rank simple algebraic groups in arbitrary characteristics. More precisely, we investigate dynamical properties of the conjugation action of such lattices on their space of positive definite functions. Our main results deal with the existence and the classification of characters from which we derive applications to topological dynamics, ergodic theory, unitary representations and operator algebras. Our key theorem is an extension of the noncommutative Nevo-Zimmer structure theorem obtained in [BH19] to the case of simple algebraic groups defined over arbitrary local fields. We also deduce a noncommutative analogue of Margulis' factor theorem for von Neumann subalgebras of the noncommutative Poisson boundary of higher rank arithmetic groups.
    SubgroupVon Neumann algebraLattice (order)RankCompact modelingEmbeddingIsomorphismArithmetic groupTorusHomomorphism...
  • This paper introduces a new topological space associated with a nonabelian free group $F_n$ of rank $n$ and a malnormal subgroup system $\mathcal{A}$ of $F_n$, called the space of currents relative to $\mathcal{A}$, which are $F_n$-invariant measures on an appropriate subspace of the double boundary of $F_n$. The extension from free factor systems as considered by Gupta to malnormal subgroup systems is necessary in order to fully study the growth under iteration of outer automorphisms of $F_n$, and requires the introduction of new techniques on cylinders. We in particular prove that currents associated with elements of $F_n$ which are not contained in a conjugate of a subgroup of $\mathcal{A}$ are dense in the space of currents relative to $\mathcal{A}$.
    SubgroupGeodesicConjugacy classRankAutomorphismFree groupRadon measureNonnegativeHyperbolic groupStar...
  • We initiate the study of torsion-free algebraically hyperbolic groups; these groups generalise, and are intricately related to, groups with no Baumslag-Solitar subgroups. Indeed, for groups of cohomological dimension 2 we prove that algebraic hyperbolicity is equivalent to containing no Baumslag-Solitar subgroups. This links algebraically hyperbolic groups to two famous questions of Gromov; recent work has shown these questions to have negative answers in general, but they remain open for groups of cohomological dimension 2. We also prove that algebraically hyperbolic groups are CSA, and so have canonical abelian JSJ-decompositions. In the two-generated case we give a precise description of the form of these decompositions.
    SubgroupTorsion tensorHyperbolic groupCohomological dimensionFree groupGraphRankClassifying spaceHNN extensionIsomorphism...
  • Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high performance. In this paper, we present a novel neural network architecture that automatically detects word- and character-level features using a hybrid bidirectional LSTM and CNN architecture, eliminating the need for most feature engineering. We also propose a novel method of encoding partial lexicon matches in neural networks and compare it to existing approaches. Extensive evaluation shows that, given only tokenized text and publicly available word embeddings, our system is competitive on the CoNLL-2003 dataset and surpasses the previously reported state of the art performance on the OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructed from publicly-available sources, we establish new state of the art performance with an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassing systems that employ heavy feature engineering, proprietary lexicons, and rich entity linking information.
    Neural networkHyperparameterArchitectureF1 scoreGoogle.comRecurrent neural networkComputational linguisticsNetwork modelHybridizationWord embedding...
  • Low-dimensional materials with broken inversion symmetry and strong spin-orbit coupling can give rise to fascinating quantum phases and phase transitions. Here we report coexistence of superconductivity and ferromagnetism below 2.5\,K in the quasi-one dimensional crystals of non-centrosymmetric (TaSe$_4$)$_3$I (space group: $P\bar{4}2_1c$). The unique phase is a direct consequence of inversion symmetry breaking as the same material also stabilizes in a centro-symmetric structure (space group: $P4/mnc$) where it behaves like a non-magnetic insulator. The coexistence here upfront contradicts the popular belief that superconductivity and ferromagnetism are two apparently antagonistic phenomena. Notably, here, for the first time, we have clearly detected Meissner effect in the superconducting state despite the coexisting ferromagnetic order. The coexistence of superconductivity and ferromagnetism projects non-centrosymmetric (TaSe$_4$)$_3$I as a host for complex ground states of quantum matter including possible unconventional superconductivity with elusive spin-triplet pairing.
    SuperconductivityFerromagnetismMagnetizationSingle crystalPhase transitionsCharge density waveAharonov-Bohm effectCoolingCritical currentNon-magnetic...
  • We present a comprehensive inter-comparison of linear regression (LR), stochastic, and deep-learning approaches for reduced-order statistical emulation of ocean circulation. The reference dataset is provided by an idealized, eddy-resolving, double-gyre ocean circulation model. Our goal is to conduct a systematic and comprehensive assessment and comparison of skill, cost, and complexity of statistical models from the three methodological classes. The model based on LR is considered as a baseline. Additionally, we investigate its additive white noise augmentation and a multi-level stochastic approach, deep-learning methods, hybrid frameworks (LR plus deep-learning), and simple stochastic extensions of deep-learning and hybrid methods. The assessment metrics considered are: root mean squared error, anomaly cross-correlation, climatology, variance, frequency map, forecast horizon, and computational cost. We found that the multi-level linear stochastic approach performs the best for both short- and long-timescale forecasts. The deep-learning hybrid models augmented by additive state-dependent white noise came second, while their deterministic counterparts failed to reproduce the characteristic frequencies in climate-range forecasts. Pure deep learning implementations performed worse than LR and its noise augmentations. Skills of LR and its white noise extension were similar on short timescales, but the latter performed better on long timescales, while LR-only outputs decay to zero for long simulations. Overall, our analysis promotes multi-level LR stochastic models with memory effects, and hybrid models with linear dynamical core augmented by additive stochastic terms learned via deep learning, as a more practical, accurate, and cost-effective option for ocean emulation than pure deep-learning solutions.
    Deep learningArtificial neural networkLong short term memoryPrincipal componentWhite noiseMean squared errorClimateHorizonRegressionEddy...
  • Recent large-area, deep CO surveys in the Galactic disk have revealed the formation of ~50 high-mass stars or clusters triggered by cloud-cloud collisions (CCCs). Although the Galactic Center (GC) -- which contains the highest volume density of molecular gas -- is the most favorable place for cloud collisions, systematic studies of CCCs in that region are still untouched. Here we report for the first time evidence of CCCs in the common foot point of molecular loops 1 and 2 in the GC. We have investigated the distribution of molecular gas toward the foot point by using a methodology for identifying CCCs, and we have discovered clear signatures of CCCs. Using the estimated displacements and relative velocities of the clouds, we find the elapsed time since the beginnings of the collisions to be 105-6 yr. We consider possible origins for previously reported peculiar velocity features in the foot point and discuss star formation triggered by CCCs in the GC.
    Galactic CenterHigh mass starGalactic disksIntensityKinematicsStar formationLine of sightTelescopesBlue shiftVelocity dispersion...
  • The importance of the working document is that it allows the analysis of information and cases associated with (SARS-CoV-2) COVID-19, based on the daily information generated by the Government of Mexico through the Secretariat of Health, responsible for the Epidemiological Surveillance System for Viral Respiratory Diseases (SVEERV). The information in the SVEERV is disseminated as open data, and the level of information is displayed at the municipal, state and national levels. On the other hand, the monitoring of the genomic surveillance of (SARS-CoV-2) COVID-19, through the identification of variants and mutations, is registered in the database of the Information System of the Global Initiative on Sharing All Influenza Data (GISAID) based in Germany. These two sources of information SVEERV and GISAID provide the information for the analysis of the impact of (SARS-CoV-2) COVID-19 on the population in Mexico. The first data source identifies information, at the national level, on patients according to age, sex, comorbidities and COVID-19 presence (SARS-CoV-2), among other characteristics. The data analysis is carried out by means of the design of an algorithm applying data mining techniques and methodology, to estimate the case fatality rate, positivity index and identify a typology according to the severity of the infection identified in patients who present a positive result. for (SARS-CoV-2) COVID-19. From the second data source, information is obtained worldwide on the new variants and mutations of COVID-19 (SARS-CoV-2), providing valuable information for timely genomic surveillance. This study analyzes the impact of (SARS-CoV-2) COVID-19 on the indigenous language-speaking population, it allows us to provide information, quickly and in a timely manner, to support the design of public policy on health.
    COVID 19MexicoMutationOpen dataData miningLanguageAlgorithms...
  • Derived actions in the category of groups with action on itself $\mathbf{Gr}^{\bullet}$ are defined and described. This category plays a crucial role in the solution of Loday's two problems stated in the literature. A full subcategory of reduced groups with action $\mathbf{rGr}^{\bullet}$ of $\mathbf{Gr}^{\bullet}$ is introduced, which is not a category of interest but has some properties, which can be applied in the investigation of action representability in this category; these properties are similar to those, which were used in the construction of universal strict general actors in the category of interest. Semi-direct product constructions are given in $\mathbf{Gr}^{\bullet}$ and $\mathbf{rGr}^{\bullet}$ and it is proved that an action is a derived action in $\mathbf{Gr}^{\bullet}$ (resp. $\mathbf{rGr}^{\bullet}$) if and only if the corresponding semi-direct product is and object of $\mathbf{Gr}^{\bullet}$ (resp. $\mathbf{rGr}^{\bullet}$). The results obtained in this paper will be applied in the forthcoming paper on the representability of actions in the category $\mathbf{rGr}^{\bullet}$.
    Category of groupsSubcategoryGroup homomorphismGroup actionAbelian categoryMorphismHomomorphismNormal subgroupFree groupAction...
  • Since the discovery of quantum mechanics, the fact that the wavefunction is defined on the $3\mathbf{n}$-dimensional configuration space rather than on the $3$-dimensional space seemed uncanny to many, including Schr\"odinger, Lorentz, and Einstein. This continues to be seen as an important problem in the foundations of quantum mechanics even today. Despite this, in Phys. Rev. A 100, 042115 (2019) (arXiv:1906.12229) it was shown that it is possible to represent the wavefunction as classical fields on space or spacetime, although in a rather complicated way, meant as a proof of concept. But in this article it will be shown that the wavefunction already is a genuine object on space. While this may seem surprising, the wavefunction has no qualitatively new features that were not previously encountered in the objects known from Euclidean geometry and classical physics. This will be shown to be true also in Felix Klein's Erlangen Program. The relation with the classification of quantum particles by the representations of the spacetime isometries realized by Wigner and Bargmann is discussed. It is argued that overwhelming empirical support already shows that the wavefunction is an object on space.
    WavefunctionFiber bundleIsometryKlein geometryManifoldCosetBundlePilot waveQuantum mechanicsSymmetry group...
  • We develop a number of basic concepts in the theory of categories internal to an $\infty$-topos. We discuss adjunctions, limits and colimits as well as Kan extensions for internal categories, and we use these results to prove the universal property of internal presheaf categories. We furthermore construct the free cocompletion of an internal category by colimits that are indexed by an arbitrary class of diagram shapes.
    FibrationGroupoidSubcategoryMonomorphismMorphismEmbeddingCommutative diagramKan extensionFactorisationEpimorphism...
  • Inspired by the foundational works by Spivak and Fong and Cruttwell et al., we introduce a categorical framework to formalize Bayesian inference and learning. The two key ideas at play here are the notions of Bayesian inversions and the functor GL as constructed by Cruttwell et al.. In this context, we find that Bayesian learning is the simplest case of the learning paradigm. We then obtain categorical formulations of batch and sequential Bayes updates while also verifying that the two coincide in a specific example.
    MorphismBayesianTraining setIsomorphismCategory theoryInferenceBayesian approachAttentionBayes' theoremBayes' rule...
  • The basic concepts of category theory are developed and examples of them are presented to illustrate them using measurement theory and probability theory tools. Motivated by Perrone's workarXiv:1912.10642 where notes on category theory are developed with examples of basic mathematics, we present the concepts of category, functor, natural transformation, and products with examples in the probabilistic context. The most prominent examples of the application of Category Theory to Probability Theory are the Lawvere (available at ncatlab.org/nlab/files/lawvereprobability1962.pdf.) and Giry (avaible at https://doi.org/10.1007/BFb0092872) approaches. However, there are few categories with objects as probability spaces due to the difficulty of finding an appropriate condition to define arrows between them
    Category theoryProbability theoryTheoryObjectTransformationsMeasurementProbability...
  • Streaming cosmic rays can power the exponential growth of a seed magnetic field by exciting a non-resonant instability that feeds on their bulk kinetic energy. By generating the necessary turbulent magnetic field, the instability is thought to play a key role in the confinement and acceleration of cosmic rays at shocks. In this work we present hybrid-Particle-In-Cell simulations of the instability including Monte Carlo collisions. Simulations of poorly ionized plasmas confirm the rapid damping of the instability by proton-neutral collisions predicted by linear fluid theory calculations. In contrast we find that Coulomb collisions do not oppose the growth of the magnetic field, but under certain conditions suppress the pressure anisotropies generated by the instability and actually enhance the magnetic field amplification.
    InstabilityCosmic rayAnisotropyCoulomb collisionMagnetic energyTwo-stream instabilityMonte Carlo methodParticle-in-cellIntensityConfinement...
  • Observations of structure at sub-galactic scales are crucial for probing the properties of dark matter, which is the dominant source of gravity in the universe. It will become increasingly important for future surveys to distinguish between line-of-sight halos and subhalos to avoid wrong inferences on the nature of dark matter. We reanalyze a sub-galactic structure (in lens JVAS B1938+666) that has been previously found using the gravitational imaging technique in galaxy-galaxy lensing systems. This structure has been assumed to be a satellite in the halo of the main lens galaxy. We fit the redshift of the perturber of the system as a free parameter, using the multi-plane thin-lens approximation, and find that the redshift of the perturber is $z_\mathrm{int} = 1.22\substack{+0.11 \\ -0.11}$ (with a main lens redshift of $z=0.881$). Our analysis indicates that this structure is more massive than the previous result by more than an order of magnitude. This constitutes the first dark perturber shown to be a line-of-sight halo with a gravitational lensing method.
    Dark matter subhaloNavarro-Frenk-White profileDark matterSingular isothermal sphere profileLine-of-sight substructureSubhalo mass functionGravitational lensingLine of sightPoint spread functionBayesian information criterion...
  • This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub "prompt-based learning". Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(y|x), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x' that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g.the choice of pre-trained models, prompts, and tuning strategies. To make the field more accessible to interested beginners, we not only make a systematic review of existing works and a highly structured typology of prompt-based concepts, but also release other resources, e.g., a website http://pretrain.nlpedia.ai/ including constantly-updated survey, and paperlist.
    EngineeringComputational linguisticsTraining setText ClassificationEmbeddingAttentionNatural languageKnowledge baseArchitectureSupervised learning...
  • State-of-the-art computer vision systems are trained to predict a fixed set of predetermined object categories. This restricted form of supervision limits their generality and usability since additional labeled data is needed to specify any other visual concept. Learning directly from raw text about images is a promising alternative which leverages a much broader source of supervision. We demonstrate that the simple pre-training task of predicting which caption goes with which image is an efficient and scalable way to learn SOTA image representations from scratch on a dataset of 400 million (image, text) pairs collected from the internet. After pre-training, natural language is used to reference learned visual concepts (or describe new ones) enabling zero-shot transfer of the model to downstream tasks. We study the performance of this approach by benchmarking on over 30 different existing computer vision datasets, spanning tasks such as OCR, action recognition in videos, geo-localization, and many types of fine-grained object classification. The model transfers non-trivially to most tasks and is often competitive with a fully supervised baseline without the need for any dataset specific training. For instance, we match the accuracy of the original ResNet-50 on ImageNet zero-shot without needing to use any of the 1.28 million training examples it was trained on. We release our code and pre-trained model weights at https://github.com/OpenAI/CLIP.
    Natural languageImage ProcessingTraining setOptical Character RecognitionLogistic regressionEmbeddingArchitectureComputational linguisticsDeep learningAttention...
  • Pre-trained representations are becoming crucial for many NLP and perception tasks. While representation learning in NLP has transitioned to training on raw text without human annotations, visual and vision-language representations still rely heavily on curated training datasets that are expensive or require expert knowledge. For vision applications, representations are mostly learned using datasets with explicit class labels such as ImageNet or OpenImages. For vision-language, popular datasets like Conceptual Captions, MSCOCO, or CLIP all involve a non-trivial data collection (and cleaning) process. This costly curation process limits the size of datasets and hence hinders the scaling of trained models. In this paper, we leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss. We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme. Our visual representation achieves strong performance when transferred to classification tasks such as ImageNet and VTAB. The aligned visual and language representations enables zero-shot image classification and also set new state-of-the-art results on Flickr30K and MSCOCO image-text retrieval benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable cross-modality search with complex text and text + image queries.
    Training setEmbeddingAttentionArchitectureComputational linguisticsNatural languageHyperparameterAblationFully connected layerWord vectors...
  • We show the improvement to cosmological constraints from galaxy cluster surveys with the addition of CMB-cluster lensing data. We explore the cosmological implications of adding mass information from the 3.1$\sigma$ detection of gravitational lensing of the cosmic microwave background (CMB) by galaxy clusters to the Sunyaev-Zel'dovich (SZ) selected galaxy cluster sample from the 2500 deg$^2$ SPT-SZ survey and targeted optical and X-ray followup data. In the $\Lambda$CDM model, the combination of the cluster sample with the Planck power spectrum measurements prefers $\sigma_8 \left(\Omega_m/0.3 \right)^{0.5} = 0.831 \pm 0.020$. Adding the cluster data reduces the uncertainty on this quantity by a factor of 1.4, which is unchanged whether or not the 3.1$\sigma$ CMB-cluster lensing measurement is included. We then forecast the impact of CMB-cluster lensing measurements with future cluster catalogs. Adding CMB-cluster lensing measurements to the SZ cluster catalog of the on-going SPT-3G survey is expected to improve the expected constraint on the dark energy equation of state $w$ by a factor of 1.3 to $\sigma(w) = 0.19$. We find the largest improvements from CMB-cluster lensing measurements to be for $\sigma_8$, where adding CMB-cluster lensing data to the cluster number counts reduces the expected uncertainty on $\sigma_8$ by factors of 2.4 and 3.6 for SPT-3G and CMB-S4 respectively.
    Cluster lensingCluster of galaxiesCMB-S4Cosmological constraintsWeak lensingWeak lensing mass estimateSPT-SZ surveyCluster samplingCalibrationCluster number counts...
  • The CO Mapping Array Project (COMAP) aims to use line intensity mapping of carbon monoxide (CO) to trace the distribution and global properties of galaxies over cosmic time, back to the Epoch of Reionization (EoR). To validate the technologies and techniques needed for this goal, a Pathfinder instrument has been constructed and fielded. Sensitive to CO(1-0) emission from $z=2.4$-$3.4$ and a fainter contribution from CO(2-1) at $z=6$-8, the Pathfinder is surveying $12$ deg$^2$ in a 5-year observing campaign to detect the CO signal from $z\sim3$. Using data from the first 13 months of observing, we estimate $P_\mathrm{CO}(k) = -2.7 \pm 1.7 \times 10^4\mu\mathrm{K}^2 \mathrm{Mpc}^3$ on scales $k=0.051-0.62 \mathrm{Mpc}^{-1}$ - the first direct 3D constraint on the clustering component of the CO(1-0) power spectrum. Based on these observations alone, we obtain a constraint on the amplitude of the clustering component (the squared mean CO line temperature-bias product) of $\langle Tb\rangle^2<49$ $\mu$K$^2$ - nearly an order-of-magnitude improvement on the previous best measurement. These constraints allow us to rule out two models from the literature. We forecast a detection of the power spectrum after 5 years with signal-to-noise ratio (S/N) 9-17. Cross-correlation with an overlapping galaxy survey will yield a detection of the CO-galaxy power spectrum with S/N of 19. We are also conducting a 30 GHz survey of the Galactic plane and present a preliminary map. Looking to the future of COMAP, we examine the prospects for future phases of the experiment to detect and characterize the CO signal from the EoR.
    GalaxySignal to noise ratioEpoch of reionizationLarge scale structure surveyLuminosityLine intensity mappingGalactic planeCalibrationCross-correlationTelescopes...