Recently bookmarked papers

with concepts:
  • Classical physics is generally regarded as deterministic, as opposed to quantum mechanics that is considered the first theory to have introduced genuine indeterminism into physics. We challenge this view by arguing that the alleged determinism of classical physics relies on the tacit, metaphysical assumption that there exists an actual value of every physical quantity, with its infinite predetermined digits (which we name \emph{principle of infinite precision}). Building on recent information-theoretic arguments showing that the principle of infinite precision (which translates into the attribution of a physical meaning to mathematical real numbers) leads to unphysical consequences, we consider possible alternative indeterministic interpretations of classical physics. We also link those to well-known interpretations of quantum mechanics. In particular, we propose a model of classical indeterminism based on \emph{finite information quantities} (FIQs). Moreover, we discuss the perspectives that an indeterministic physics could open (such as strong emergence), as well as some potential problematic issues. Finally, we make evident that any indeterministic interpretation of physics would have to deal with the problem of explaining how the indeterminate values become determinate, a problem known in the context of quantum mechanics as (part of) the ``quantum measurement problem''. We discuss some similarities between the classical and the quantum measurement problems, and propose ideas for possible solutions (e.g., ``collapse models'' and ``top-down causation'').
    Quantum mechanicsMeasurement problemQuantum measurementInterpretations of quantum mechanicsPhase spaceQuantum theoryCopenhagen interpretationInformation theoryInfinitesimalSecond law of thermodynamics...
  • The event rates for WIMP-nucleus and neutrino-nucleus scattering processes, expected to be detected at ton-scale rare-event detectors, are investigated. We focus on nuclear isotopes that correspond to the target nuclei of current and future experiments looking for WIMP- and neutrino-nucleus events. The nuclear structure calculations, performed in the context of the Deformed Shell Model, are based on Hartree-Fock intrinsic states with angular momentum projection and band mixing for both the elastic and inelastic channels. Our predictions in the high recoil energy tail, show that detectable distortions of the measured/expected signal may be interpreted through the inclusion of the non-negligible incoherent channels.
    Weakly interacting massive particleNeutrinoDark matterIsotopeHartree-Fock approximationForm factorNeutrino-nucleus scatteringShell modelSpin structureEarth...
  • We report direct-detection constraints on light dark matter particles interacting with electrons. The results are based on a method that exploits the extremely low levels of leakage current of the DAMIC detector at SNOLAB of 2-6$\times$10$^{-22}$ A cm$^{-2}$. We evaluate the charge distribution of pixels that collect $<10~\rm{e^-}$ for contributions beyond the leakage current that may be attributed to dark matter interactions. Constraints are placed on so-far unexplored parameter space for dark matter masses between 0.6 and 100 MeV$c^{-2}$. We also present new constraints on hidden-photon dark matter with masses in the range $1.2$-$30$ eV$c^{-2}$.
    Dark matterDark matter particleHidden photonLight dark matterDark matter particle massIonizationForm factorWeakly interacting massive particleCalibrationMultidimensional Array...
  • Alice ultraviolet spectrometer onboard Rosetta space mission observed several spectroscopic emissions emanated from volatile species of comet 67P/Churyumov-Gerasimenko (hear after 67P/C-G) during its entire escorting phase. We have developed a photochemical model for comet 67P/C-G to study the atomic hydrogen (HI 1216, 1025, & 973 Ang), oxygen (OI 1152, 1304, & 1356 Ang), and carbon (CI 1561 & 1657 Ang) line emissions by accounting for major production pathways. The developed model has been used to calculate the emission intensities of these lines as a function of nucleocentric projected distance and also along with the nadir view by varying the input parameters, viz., neutral abundances and cross sections. We have quantified the percentage contributions of photon and electron impact dissociative excitation processes to the total intensity of the emission lines, which has important relevance for the analysis of Alice observed spectra. It is found that in comet 67P/C-G, which is having a neutral gas production rate of about 10$^{27}$ s$^{-1}$ when it was at 1.56 AU from the Sun, photodissociative excitation processes are more significant compared to electron impact reactions in determining the atomic emission intensities. Based on our model calculations, we suggest that the observed atomic hydrogen, oxygen, and carbon emission intensities can be used to derive H$_2$O, O$_2$, and CO, abundances, respectively, rather than electron density in the coma of 67P/C-G, when the comet has a gas production rate of $\ge$ 10$^{27}$ s$^{-1}$.
    IntensityCometComa of a cometPhotodissociationAstronomical UnitResonance fluorescenceSunNadirLine emissionSpectrometers...
  • The vertical diffusive halo size of the Galaxy, $L$, is a key parameter for dark matter indirect searches. It can be better determined thanks to recent AMS-02 data. We set constraints on $L$ from Be/B and $^{10}$Be/Be data, and perform a consistency check with positron data. We detail the dependence of Be/B and $^{10}$Be/Be on $L$ and forecast on which energy range better data would be helpful for future $L$ improvements. We use USINE v3.5 for the propagation of nuclei, and $e^+$ are calculated with the pinching method of Boudaud et al. (2017). The current AMS-02 Be/B ($\sim3\%$ precision) and ACE-CRIS $^{10}$Be/Be ($\sim 10\%$ precision) data bring similar and consistent constraints on $L$. The AMS-02 Be/B data alone constrain $L=5^{+3}_{-2}$ kpc at $1\sigma$, a range for which most models do not overproduce positrons. Future experiments need to deliver percent-level accuracy on $^{10}$Be/$^9$Be anywhere below 10 GV to further constrain $L$. Forthcoming AMS-02, HELIX, and PAMELA $^{10}$Be/$^9$Be results will further test and possibly tighten the limits derived here. Elemental ratios involving radioactive species with different lifetimes (e.g., Al/Mg and Cl/Ar) are also awaited to provide complementary and robuster constraints.
    PositronAlpha Magnetic SpectrometerCosmic rayIsotopySolar modulationMilky WayPAMELA experimentProduction cross-sectionDiffusion coefficientDark matter...
  • The vertical temperature structure of a protoplanetary disk bears on several processes relevant to planet formation, such as gas and dust grain chemistry, ice lines and convection. The temperature profile is controlled by irradiation from the central star and by any internal source of heat as caused by gas accretion. We investigate the heat and angular momentum transport generated by the resistive dissipation of magnetic fields in laminar disks. We use local one-dimensional simulations to obtain vertical temperature profiles for typical conditions in the inner disk (0.5 to 4 au). Using simple assumptions for the gas ionization and opacity, the heating and cooling rates are computed self-consistently in the framework of radiative non-ideal magnetohydrodynamics. We characterize steady solutions that are symmetric about the midplane and which may be associated with saturated Hall-shear unstable modes. We also examine the dissipation of electric currents driven by global accretion-ejection structures. In both cases we obtain significant heating for a sufficiently high opacity. Sufficiently strong magnetic fields can induce an order-unity temperature increase in the disk midplane, a convectively unstable entropy profile, and a surface emissivity equivalent to a viscous heating of $\alpha \sim 10^{-2}$. These results show how magnetic fields may drive efficient accretion and heating in weakly ionized disks where turbulence might be inefficient, at least for a range of radii and ages of the disk.
    AccretionDissipationOpacityInstabilitySteady stateIonization fractionProtoplanetary diskDust grainMagnetizationIonization...
  • Deep convolutional neural networks have been a popular tool for image generation and restoration. The performance of these networks is related to the capability of learning realistic features from a large dataset. In this work, we applied the problem of inpainting non-Gaussian signal, in the context of Galactic diffuse emissions at the millimetric and sub-millimetric regimes, specifically Synchrotron and Thermal Dust emission. Both of them are affected by contamination at small angular scales due to extra-galactic radio sources (the former) and to dusty star-forming galaxies (the latter). We consider the performances of a nearest-neighbors inpainting technique and compare it with two novels methodologies relying on generative Neural Networks. We show that the generative network is able to reproduce the statistical properties of the ground truth signal more consistently with high confidence level. The Python Inpainter for Cosmological and AStrophysical SOurces (PICASSO) is a package encoding a suite of inpainting methods described in this work and has been made publicly available.
    Generative Adversarial NetSynchrotronGround truthDeep convolutional neural networksIntensityMinkowski functionalPoint sourceStatisticsAttentionDust emission...
  • We perform a-priori validation tests of subgrid-scale (SGS) models for the turbulent transport of momentum, energy and passive scalars. To this end, we conduct two sets of high-resolution hydrodynamical simulations with a Lagrangian code: an isothermal turbulent box with rms Mach number of 0.3, 2 and 8, and the classical wind tunnel where a cold cloud traveling through a hot medium gradually dissolves due to fluid instabilities. Two SGS models are examined: the eddy diffusivity (ED) model wildly adopted in astrophysical simulations and the "gradient model" due to Clark et al. (1979). We find that both models predict the magnitude of the SGS terms equally well (correlation coefficient > 0.8). However, the gradient model provides excellent predictions on the orientation and shape of the SGS terms while the ED model predicts poorly on both, indicating that isotropic diffusion is a poor approximation of the instantaneous turbulent transport. The best-fit coefficient of the gradient model is in the range of [0.16, 0.21] for the momentum transport, and the turbulent Schmidt number and Prandtl number are both close to unity, in the range of [0.92, 1.15].
    OrientationCoarse grainingDiffusion coefficientMach numberTurbulenceWind tunnelNumerical diffusionInstabilitySchmidt numberHydrodynamical simulations...
  • I aim to clarify the physical content and significance of naturalness. Physicists' earliest understanding of naturalness, as an autonomy of scales (AoS) requirement provides the most cogent definition of naturalness and I will assert that i) this provides a uniform notion which undergirds a myriad prominent naturalness conditions, ii) this is a reasonable criterion to impose on EFTs and iii) the successes and violations of naturalness are best understood when adhering to this notion of naturalness. I argue that this principle is neither an aesthetic nor a sociologically-influenced principle. I contend that naturalness may only be plausibly argued to be an aesthetic/sociological principle when formal measures of naturalness and their use in physics communities are conflated with the central dogma of naturalness - the former may indeed be argued to be sociologically-influenced and somewhat arbitrary - but these formal measures of naturalness are in fact less successful than AoS naturalness. I put forward arguments as to why AoS naturalness is well-defined and why it was reasonable for physicists to endorse this naturalness principle on both theoretical and empirical grounds. Up to date, no compelling reasons have appeared as to why the laws of nature should generically decoupling into quasi-autonomous physical domains. A decoupling of scales in the quantum realm is often claimed to be entailed by the Decoupling Theorem (Cao and Schweber (1993)), yet I will show that this theorem is too weak to underwrite quasi-autonomous physical domains in quantum field theories because one should additionally impose that parameters be natural. Violations of naturalness would then have ontological import - unnatural parameters would not be accurately described by EFTs but rather by field theories exhibiting some kind of UV/IR interplay.
    NaturalnessEffective field theoryStandard ModelHiggs bosonNaturalness problemMultiverseQuantum field theoryField theorySupersymmetryCosmological constant...
  • Single Image Super Resolution (SISR) is a well-researched problem with broad commercial relevance. However, most of the SISR literature focuses on small-size images under 500px, whereas business needs can mandate the generation of very high resolution images. At Expedia Group, we were tasked with generating images of at least 2000px for display on the website, four times greater than the sizes typically reported in the literature. This requirement poses a challenge that state-of-the-art models, validated on small images, have not been proven to handle. In this paper, we investigate solutions to the problem of generating high-quality images for large-scale super resolution in a commercial setting. We find that training a generative adversarial network (GAN) with attention from scratch using a large-scale lodging image data set generates images with high PSNR and SSIM scores. We describe a novel attentional SISR model for large-scale images, A-SRGAN, that uses a Flexible Self Attention layer to enable processing of large-scale images. We also describe a distributed algorithm which speeds up training by around a factor of five.
    Generative Adversarial NetAttentionSpectral normalizationArchitectureDeep learningImage ProcessingGoogle.comHyperparameterTraining setInductive transfer...
  • Generative Adversarial Networks (GANs) in supervised settings can generate photo-realistic corresponding output from low-definition input (SRGAN). Using the architecture presented in the SRGAN original paper [2], we explore how selecting a dataset affects the outcome by using three different datasets to see that SRGAN fundamentally learns objects, with their shape, color, and texture, and redraws them in the output rather than merely attempting to sharpen edges. This is further underscored with our demonstration that once the network learns the images of the dataset, it can generate a photo-like image with even a slight hint of what it might look like for the original from a very blurry edged sketch. Given a set of inference images, the network trained with the same dataset results in a better outcome over the one trained with arbitrary set of images, and we report its significance numerically with Frechet Inception Distance score [22].
    Generative Adversarial NetArchitectureInferenceNetworksObjectSketch...
  • Enhanced Super-Resolution Generative Adversarial Network (ESRGAN) is a perceptual-driven approach for single image super resolution that is able to produce photorealistic images. Despite the visual quality of these generated images, there is still room for improvement. In this fashion, the model is extended to further improve the perceptual quality of the images. We have designed a novel block to replace the one used by the original ESRGAN. Moreover, we introduce noise inputs to the generator network in order to exploit stochastic variation. The resulting images present more realistic textures.
    Generative Adversarial NetArchitectureScale factorGround truthDeep learningMean squared errorManifoldGaussian noiseTraining setResolution...
  • Facial composites are graphical representations of an eyewitness's memory of a face. Many digital systems are available for the creation of such composites but are either unable to reproduce features unless previously designed or do not allow holistic changes to the image. In this paper, we improve the efficiency of composite creation by removing the reliance on expert knowledge and letting the system learn to represent faces from examples. The novel approach, Composite Generating GAN (CG-GAN), applies generative and evolutionary computation to allow casual users to easily create facial composites. Specifically, CG-GAN utilizes the generator network of a pg-GAN to create high-resolution human faces. Users are provided with several functions to interactively breed and edit faces. CG-GAN offers a novel way of generating and handling static and animated photo-realistic facial composites, with the possibility of combining multiple representations of the same perpetrator, generated by different eyewitnesses.
    Generative Adversarial NetMutationSoftwareFreezingTraining setLatent spaceFatigueCosine similarityGaussian noiseSupervised learning...
  • Several recent works have shown how highly realistic human head images can be obtained by training convolutional neural networks to generate them. In order to create a personalized talking head model, these works require training on a large dataset of images of a single person. However, in many practical scenarios, such personalized talking head models need to be learned from a few image views of a person, potentially even a single image. Here, we present a system with such few-shot capability. It performs lengthy meta-learning on a large dataset of videos, and after that is able to frame few- and one-shot learning of neural talking head models of previously unseen people as adversarial training problems with high capacity generators and discriminators. Crucially, the system is able to initialize the parameters of both the generator and the discriminator in a person-specific way, so that training can be based on just a few images and done quickly, despite the need to tune tens of millions of parameters. We show that such an approach is able to learn highly realistic and personalized talking head models of new people and even portrait paintings.
    Meta learningEmbeddingConvolutional neural networkPersonalizationGround truthAttentionArchitectureTraining ImageInferenceAblation...
  • We propose Image2StyleGAN++, a flexible image editing framework with many applications. Our framework extends the recent Image2StyleGAN in three ways. First, we introduce noise optimization as a complement to the $W^+$ latent space embedding. Our noise optimization can restore high frequency features in images and thus significantly improves the quality of reconstructed images, e.g. a big increase of PSNR from 20 dB to 45 dB. Second, we extend the global $W^+$ latent space embedding to enable local embeddings. Third, we combine embedding with activation tensor manipulation to perform high quality local edits along with global semantic edits on images. Such edits motivate various high quality image editing applications, e.g. image reconstruction, image inpainting, image crossover, local style transfer, image editing using scribbles, and attribute level feature transfer. Examples of the edited images are shown across the paper for visual inspection.
    EmbeddingOptimizationGenerative Adversarial NetLatent spaceArchitectureGround truthImage ProcessingRegularizationNeural networkAttention...
  • This paper presents a novel framework to generate realistic face video of an anchor, who is reading certain news. This task is also known as Virtual Anchor. Given some paragraphs of words, we first utilize a pretrained Word2Vec model to embed each word into a vector; then we utilize a Seq2Seq-based model to translate these word embeddings into action units and head poses of the target anchor; these action units and head poses will be concatenated with facial landmarks as well as the former $n$ synthesized frames, and the concatenation serves as input of a Pix2PixHD-based model to synthesize realistic facial images for the virtual anchor. The experimental results demonstrate our framework is feasible for the synthesis of virtual anchor.
    Generative Adversarial NetGround truthArchitectureAttentionLong short term memoryIntermediate representationConvolutional neural networkRecurrent neural networkGAN-based modelLatent space...
  • We present a novel framework to generate images of different age while preserving identity information, which is known as face aging. Different from most recent popular face aging networks utilizing Generative Adversarial Networks(GANs) application, our approach do not simply transfer a young face to an old one. Instead, we employ the edge map as intermediate representations, firstly edge maps of young faces are extracted, a CycleGAN-based network is adopted to transfer them into edge maps of old faces, then another pix2pixHD-based network is adopted to transfer the synthesized edge maps, concatenated with identity information, into old faces. In this way, our method can generate more realistic transfered images, simultaneously ensuring that face identity information be preserved well, and the apparent age of the generated image be accurately appropriate. Experimental results demonstrate that our method is feasible for face age translation.
    Generative Adversarial NetRegion of interestAttentionTraining setIntermediate representationEmbeddingGenerative modelGround truthGlassMinimax...
  • Automatic music transcription is considered to be one of the hardest problems in music information retrieval, yet recent deep learning approaches have achieved substantial improvements on transcription performance. These approaches commonly employ supervised learning models that predict various time-frequency representations, by minimizing element-wise losses such as the cross entropy function. However, applying the loss in this manner assumes conditional independence of each label given the input, and thus cannot accurately express inter-label dependencies. To address this issue, we introduce an adversarial training scheme that operates directly on the time-frequency representations and makes the output distribution closer to the ground-truth. Through adversarial learning, we achieve a consistent improvement in both frame-level and note-level metrics over Onsets and Frames, a state-of-the-art music transcription model. Our results show that adversarial learning can significantly reduce the error rate while increasing the confidence of the model estimations. Our approach is generic and applicable to any transcription model based on multi-label predictions, which are very common in music signal analysis.
    Generative Adversarial NetGround truthRecurrent neural networkDeep learningF1 scoreEntropyConditional IndependenceLeast squaresTraining setHidden Markov model...
  • As a sub-domain of text-to-image synthesis, text-to-face generation has huge potentials in public safety domain. With lack of dataset, there are almost no related research focusing on text-to-face synthesis. In this paper, we propose a fully-trained Generative Adversarial Network (FTGAN) that trains the text encoder and image decoder at the same time for fine-grained text-to-face generation. With a novel fully-trained generative network, FTGAN can synthesize higher-quality images and urge the outputs of the FTGAN are more relevant to the input sentences. In addition, we build a dataset called SCU-Text2face for text-to-face synthesis. Through extensive experiments, the FTGAN shows its superiority in boosting both generated images' quality and similarity to the input descriptions. The proposed FTGAN outperforms the previous state of the art, boosting the best reported Inception Score to 4.63 on the CUB dataset. On SCU-text2face, the face images generated by our proposed FTGAN just based on the input descriptions is of average 59% similarity to the ground-truth, which set a baseline for text-to-face synthesis.
    Generative Adversarial NetGround truthAttentionConvolutional neural networkCOCO simulationRecurrent neural networkSemantic similarityGaussian distributionGAN-based modelImage Processing...
  • Dramatic advances in generative models have resulted in near photographic quality for artificially rendered faces, animals and other objects in the natural world. In spite of such advances, a higher level understanding of vision and imagery does not arise from exhaustively modeling an object, but instead identifying higher-level attributes that best summarize the aspects of an object. In this work we attempt to model the drawing process of fonts by building sequential generative models of vector graphics. This model has the benefit of providing a scale-invariant representation for imagery whose latent representation may be systematically manipulated and exploited to perform style propagation. We demonstrate these results on a large dataset of fonts and highlight how such a model captures the statistical dependencies and richness of this dataset. We envision that our model can find use as a tool for graphic designers to facilitate font design.
    Generative modelLatent spaceScale invarianceArchitectureLong short term memoryOptimizationAutoencoderManifoldTraining setEntropy...
  • An autoencoder is a neural network which data projects to and from a lower dimensional latent space, where this data is easier to understand and model. The autoencoder consists of two sub-networks, the encoder and the decoder, which carry out these transformations. The neural network is trained such that the output is as close to the input as possible, the data having gone through an information bottleneck : the latent space. This tool bears significant ressemblance to Principal Component Analysis (PCA), with two main differences. Firstly, the autoencoder is a non-linear transformation, contrary to PCA, which makes the autoencoder more flexible and powerful. Secondly, the axes found by a PCA are orthogonal, and are ordered in terms of the amount of variability which the data presents along these axes. This makes the interpretability of the PCA much greater than that of the autoencoder, which does not have these attributes. Ideally, then, we would like an autoencoder whose latent space consists of independent components, ordered by decreasing importance to the data. In this paper, we propose an algorithm to create such a network. We create an iterative algorithm which progressively increases the size of the latent space, learning a new dimension at each step. Secondly, we propose a covariance loss term to add to the standard autoencoder loss function, as well as a normalisation layer just before the latent space, which encourages the latent space components to be statistically independent. We demonstrate the results of this autoencoder on simple geometric shapes, and find that the algorithm indeed finds a meaningful representation in the latent space. This means that subsequent interpolation in the latent space has meaning with respect to the geometric properties of the images.
    AutoencoderLatent spacePrincipal component analysisCovarianceNeural networkArchitectureUniform distributionData samplingGenerative Adversarial NetHidden layer...
  • In this paper, we introduce a tunable generative adversary network (TunaGAN) that uses an auxiliary network on top of existing generator networks (Style-GAN) to modify high-resolution face images according to user's high-level instructions, with good qualitative and quantitative performance. To optimize for feature disentanglement, we also investigate two different latent space that could be traversed for modification. The problem of mode collapse is characterized in detail for model robustness. This work could be easily extended to content-aware image editor based on other GANs and provide insight on mode collapse problems in more general settings.
    Generative Adversarial NetLatent spaceNeural networkMultilayer perceptronFeature spaceArchitectureGlassSupport vector machineVector spaceFully connected layer...
  • One of the main motivations for training high quality image generative models is their potential use as tools for image manipulation. Recently, generative adversarial networks (GANs) have been able to generate images of remarkable quality. Unfortunately, adversarially-trained unconditional generator networks have not been successful as image priors. One of the main requirements for a network to act as a generative image prior, is being able to generate every possible image from the target distribution. Adversarial learning often experiences mode-collapse, which manifests in generators that cannot generate some modes of the target distribution. Another requirement often not satisfied is invertibility i.e. having an efficient way of finding a valid input latent code given a required output image. In this work, we show that differently from earlier GANs, the very recently proposed style-generators are quite easy to invert. We use this important observation to propose style generators as general purpose image priors. We show that style generators outperform other GANs as well as Deep Image Prior as priors for image enhancement tasks. The latent space spanned by style-generators satisfies linear identity-pose relations. The latent space linearity, combined with invertibility, allows us to animate still facial images without supervision. Extensive experiments are performed to support the main contributions of this paper.
    Generative Adversarial NetGenerative modelOptimizationArchitectureLatent spaceImage ProcessingDeep learningTraining ImageGaussian mixture modelAutoencoder...
  • Existing public face datasets are strongly biased toward Caucasian faces, and other races (e.g., Latino) are significantly underrepresented. This can lead to inconsistent model accuracy, limit the applicability of face analytic systems to non-White race groups, and adversely affect research findings based on such skewed data. To mitigate the race bias in these datasets, we construct a novel face image dataset, containing 108,501 images, with an emphasis of balanced race composition in the dataset. We define 7 race groups: White, Black, Indian, East Asian, Southeast Asian, Middle East, and Latino. Images were collected from the YFCC-100M Flickr dataset and labeled with race, gender, and age groups. Evaluations were performed on existing face attribute datasets as well as novel image datasets to measure generalization performance. We find that the model trained from our dataset is substantially more accurate on novel datasets and the accuracy is consistent between race and gender groups.
    ClassificationImage ProcessingTwitterEmbeddingTraining setGround truthStatisticsMachine learningCreative commons licenseSecurity...
  • Image generation task has received increasing attention because of its wide application in security and entertainment. Sketch-based face generation brings more fun and better quality of image generation due to supervised interaction. However, When a sketch poorly aligned with the true face is given as input, existing supervised image-to-image translation methods often cannot generate acceptable photo-realistic face images. To address this problem, in this paper we propose Cali-Sketch, a poorly-drawn-sketch to photo-realistic-image generation method. Cali-Sketch explicitly models stroke calibration and image generation using two constituent networks: a Stroke Calibration Network (SCN), which calibrates strokes of facial features and enriches facial details while preserving the original intent features; and an Image Synthesis Network (ISN), which translates the calibrated and enriched sketches to photo-realistic face images. In this way, we manage to decouple a difficult cross-domain translation problem into two easier steps. Extensive experiments verify that the face photos generated by Cali-Sketch are both photo-realistic and faithful to the input sketches, compared with state-of-the-art methods
    CalibrationGenerative Adversarial NetGround truthArchitectureSparsityAttentionTotal-Variation regularizationSecurityTraining setDeep learning...
  • Recent face reenactment studies have achieved remarkable success either between two identities or in the many-to-one task. However, existing methods have limited scalability when the target person is not a predefined specific identity. To address this limitation, we present a novel many-to-many face reenactment framework, named FaceSwapNet, which allows transferring facial expressions and movements from one source face to arbitrary targets. Our proposed approach is composed of two main modules: the landmark swapper and the landmark-guided generator. Instead of maintaining independent models for each pair of person, the former module uses two encoders and one decoder to adapt anyone's face landmark to target persons. Using the neutral expression of the target person as a reference image, the latter module leverages geometry information from the swapped landmark to generate photo-realistic and emotion-alike images. In addition, a novel triplet perceptual loss is proposed to force the generator to learn geometry and appearance information simultaneously. We evaluate our model on RaFD dataset and the results demonstrate the superior quality of reenacted images as well as the flexibility of transferring facial movements between identities.
    Latent spaceAnyonGenerative Adversarial NetArchitectureGround truthAblationAttentionAperture synthesisTraining setGeometry...
  • A new generative adversarial network is developed for joint distribution matching. Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains). This is achieved by learning to sample from conditional distributions between the domains, while simultaneously learning to sample from the marginals of each individual domain. The proposed framework consists of multiple generators and a single softmax-based critic, all jointly trained via adversarial learning. From a simple noise source, the proposed framework allows synthesis of draws from the marginals, conditional draws given observations from a subset of random variables, or complete draws from the full joint distribution. Most examples considered are for joint analysis of two domains, with examples for three domains also presented.
    Neural networkGenerative Adversarial NetMinimaxConvolutional neural networkLatent variableInferenceRankingLong short term memoryAutoencoderUnsupervised learning...
  • Synthetic image translation has significant potentials in autonomous transportation systems. That is due to the expense of data collection and annotation as well as the unmanageable diversity of real-words situations. The main issue with unpaired image-to-image translation is the ill-posed nature of the problem. In this work, we propose a novel method for constraining the output space of unpaired image-to-image translation. We make the assumption that the environment of the source domain is known (e.g. synthetically generated), and we propose to explicitly enforce preservation of the ground-truth labels on the translated images. We experiment on preserving ground-truth information such as semantic segmentation, disparity, and instance segmentation. We show significant evidence that our method achieves improved performance over the state-of-the-art model of UNIT for translating images from SYNTHIA to Cityscapes. The generated images are perceived as more realistic in human surveys and outperforms UNIT when used in a domain adaptation scenario for semantic segmentation.
    Ground truthSemantic segmentationGenerative Adversarial NetArchitectureStatistical estimatorAutonomous drivingObject detectionCOCO simulationTraining setDecision making...
  • Image translation is a burgeoning field in computer vision where the goal is to learn the mapping between an input image and an output image. However, most recent methods require multiple generators for modeling different domain mappings, which are inefficient and ineffective on some multi-domain image translation tasks. In this paper, we propose a novel method, SingleGAN, to perform multi-domain image-to-image translations with a single generator. We introduce the domain code to explicitly control the different generative tasks and integrate multiple optimization goals to ensure the translation. Experimental results on several unpaired datasets show superior performance of our model in translation between two domains. Besides, we explore variants of SingleGAN for different tasks, including one-to-many domain translation, many-to-many domain translation and one-to-one domain translation with multimodality. The extended experiments show the universality and extensibility of our model.
    Generative Adversarial NetClassificationImage ProcessingOptimizationSupervised learningFeature spaceArchitectureCosine similarityLeast squaresUnsupervised learning...
  • Face transfer animates the facial performances of the character in the target video by a source actor. Traditional methods are typically based on face modeling. We propose an end-to-end face transfer method based on Generative Adversarial Network. Specifically, we leverage CycleGAN to generate the face image of the target character with the corresponding head pose and facial expression of the source. In order to improve the quality of generated videos, we adopt PatchGAN and explore the effect of different receptive field sizes on generated images.
    ArchitectureImage ProcessingHyperparameterNearest-neighbor siteDeep Neural NetworksRandom FieldGlassStatisticsLeast squaresComputer graphics...
  • It has been claimed that the standard model of cosmology (LCDM) cannot easily account for a number of observations on relatively small scales, motivating extensions to the standard model. Here we introduce a new suite of cosmological simulations that systematically explores three plausible extensions: warm dark matter, self-interacting dark matter, and a running of the scalar spectral index of density fluctuations. Current observational constraints are used to specify the additional parameters that come with these extensions. We examine a large range of observable metrics on small scales, including the halo mass function, density and circular velocity profiles, the abundance of satellite subhaloes, and halo concentrations. For any given metric, significant degeneracies can be present between the extensions. In detail, however, the different extensions have quantitatively distinct mass and radial dependencies, suggesting that a multi-probe approach over a range of scales can be used to break the degeneracies. We also demonstrate that the relative effects on the radial density profiles in the different extensions (compared to the standard model) are converged down to significantly smaller radii than are the absolute profiles. We compare the derived cosmological trends with the impact of baryonic physics using the EAGLE and ARTEMIS simulations. Significant degeneracies are also present between baryonic physics and cosmological variations (with both having similar magnitude effects on some observables). Given the inherent uncertainties both in the modelling of galaxy formation physics and extensions to LCDM, a systematic and simultaneous exploration of both is strongly warranted.
    CosmologyDark matter subhaloSelf-interacting dark matterWarm dark matterDark matterVirial massStandard ModelLambda-CDM modelCircular velocityHalo mass function...
  • Hand gesture-to-gesture translation in the wild is a challenging task since hand gestures can have arbitrary poses, sizes, locations and self-occlusions. Therefore, this task requires a high-level understanding of the mapping between the input source gesture and the output target gesture. To tackle this problem, we propose a novel hand Gesture Generative Adversarial Network (GestureGAN). GestureGAN consists of a single generator $G$ and a discriminator $D$, which takes as input a conditional hand image and a target hand skeleton image. GestureGAN utilizes the hand skeleton information explicitly, and learns the gesture-to-gesture mapping through two novel losses, the color loss and the cycle-consistency loss. The proposed color loss handles the issue of "channel pollution" while back-propagating the gradients. In addition, we present the Fr\'echet ResNet Distance (FRD) to evaluate the quality of generated images. Extensive experiments on two widely used benchmark datasets demonstrate that the proposed GestureGAN achieves state-of-the-art performance on the unconstrained hand gesture-to-gesture translation task. Meanwhile, the generated images are in high-quality and are photo-realistic, allowing them to be used as data augmentation to improve the performance of a hand gesture classifier. Our model and code are available at https://github.com/Ha0Tang/GestureGAN.
    Generative Adversarial NetFréchet distanceTraining setGenerative modelConvolutional neural networkFeature spaceImage ProcessingFeature vectorFully connected layerOptimization...
  • In this paper, we aim at solving the multi-domain image-to-image translation problem with a unified model in an unsupervised manner. The most successful work in this area refers to StarGAN, which works well in tasks like face attribute modulation. However, StarGAN is unable to match multiple translation mappings when encountering general translations with very diverse domain shifts. On the other hand, StarGAN adopts an Encoder-Decoder-Discriminator (EDD) architecture, where the model is time-consuming and unstable to train. To this end, we propose a Compact, effective, robust, and fast GAN model, termed CerfGAN, to solve the above problem. In principle, CerfGAN contains a novel component, i.e., a multi-class discriminator (MCD), which gives the model an extremely powerful ability to match multiple translation mappings. To stabilize the training process, MCD also plays a role of the encoder in CerfGAN, which saves a lot of computation and memory costs. We perform extensive experiments to verify the effectiveness of the proposed method. Quantitatively, CerfGAN is demonstrated to handle a serial of image-to-image translation tasks including style transfer, season transfer, face hallucination, etc, where the input images are sampled from diverse domains. The comparisons to several recently proposed approaches demonstrate the superiority and novelty of the proposed method.
    Generative Adversarial NetArchitectureAutoencoderClassificationImage ProcessingMNIST datasetGradient flowConvolutional neural networkGoogle.comInference...
  • Image-to-image translation models have shown remarkable ability on transferring images among different domains. Most of existing work follows the setting that the source domain and target domain keep the same at training and inference phases, which cannot be generalized to the scenarios for translating an image from an unseen domain to an another unseen domain. In this work, we propose the Unsupervised Zero-Shot Image-to-image Translation (UZSIT) problem, which aims to learn a model that can transfer translation knowledge from seen domains to unseen domains. Accordingly, we propose a framework called ZstGAN: By introducing an adversarial training scheme, ZstGAN learns to model each domain with domain-specific feature distribution that is semantically consistent on vision and attribute modalities. Then the domain-invariant features are disentangled with an shared encoder for image generation. We carry out extensive experiments on CUB and FLO datasets, and the results demonstrate the effectiveness of proposed method on UZSIT task. Moreover, ZstGAN shows significant accuracy improvements over state-of-the-art zero-shot learning methods on CUB and FLO.
    ClassificationGenerative Adversarial NetOptimizationConvolutional neural networkLatent spaceInferenceGenerative modelMachine translationNearest-neighbor siteMutual information...
  • We introduce GANHopper, an unsupervised image-to-image translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce in-between images which resemble weighted hybrids between images from the two in-put domains. Our network is trained on unpaired images from the two domains only, without any in-between images.All hops are produced using a single generator along each direction. In addition to the standard cycle-consistency and adversarial losses, we introduce a new hybrid discrimina-tor, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count. We also introduce a smoothness term to constrain the magnitude of each hop,further regularizing the translation. Compared to previous methods, GANHopper excels at image translations involving domain-specific image features and geometric variations while also preserving non-domain-specific features such as backgrounds and general color schemes.
    Generative Adversarial NetArchitectureLatent spaceTraining setGround truthTraining ImageImage segmentationSemantic segmentationImage ProcessingSaturnian satellites...
  • Voice impersonation is not the same as voice transformation, although the latter is an essential element of it. In voice impersonation, the resultant voice must convincingly convey the impression of having been naturally produced by the target speaker, mimicking not only the pitch and other perceivable signal qualities, but also the style of the target speaker. In this paper, we propose a novel neural network based speech quality- and style- mimicry framework for the synthesis of impersonated voices. The framework is built upon a fast and accurate generative adversarial network model. Given spectrographic representations of source and target speakers' voices, the model learns to mimic the target speaker's voice quality and style, regardless of the linguistic content of either's voice, generating a synthetic spectrogram from which the time domain signal is reconstructed using the Griffin-Lim method. In effect, this model reframes the well-known problem of style-transfer for images as the problem of style-transfer for speech signals, while intrinsically addressing the problem of durational variability of speech sounds. Experiments demonstrate that the model can generate extremely convincing samples of impersonated speech. It is even able to impersonate voices across different genders effectively. Results are qualitatively evaluated using standard procedures for evaluating synthesized voices.
    Generative Adversarial NetGenerative modelConvolutional neural networkNetwork modelNeural networkOptimizationArchitectureSpectral envelopeDeep Neural NetworksTime Series...
  • Unsupervised image-to-image translation aims at learning the relationship between samples from two image domains without supervised pair information. The relationship between two domain images can be one-to-one, one-to-many or many-to-many. In this paper, we study the one-to-many unsupervised image translation problem in which an input sample from one domain can correspond to multiple samples in the other domain. To learn the complex relationship between the two domains, we introduce an additional variable to control the variations in our one-to-many mapping. A generative model with an XO-structure, called the XOGAN, is proposed to learn the cross domain relationship among the two domains and the ad- ditional variables. Not only can we learn to translate between the two image domains, we can also handle the translated images with additional variations. Experiments are performed on unpaired image generation tasks, including edges-to-objects translation and facial image translation. We show that the proposed XOGAN model can generate plausible images and control variations, such as color and texture, of the generated images. Moreover, while state-of-the-art unpaired image generation algorithms tend to generate images with monotonous colors, XOGAN can generate more diverse results.
    Generative modelGaussian distributionSoftwareGround truthDeep Neural NetworksImage ProcessingAutoencoderHyperparameterGaussian noiseTraining set...
  • Standard neural networks are often overconfident when presented with data outside the training distribution. We introduce HyperGAN, a new generative model for learning a distribution of neural network parameters. HyperGAN does not require restrictive assumptions on priors, and networks sampled from it can be used to quickly create very large and diverse ensembles. HyperGAN employs a novel mixer to project prior samples to a latent space with correlated dimensions, and samples from the latent space are then used to generate weights for each layer of a deep neural network. We show that HyperGAN can learn to generate parameters which label the MNIST and CIFAR-10 datasets with competitive performance to fully supervised learning, while learning a rich distribution of effective parameters. We also show that HyperGAN can also provide better uncertainty estimates than standard ensembles by evaluating on out of distribution data as well as adversarial examples.
    Neural networkAdversarial examplesLatent spaceGenerative Adversarial NetEntropyClassificationArchitectureGenerative modelTraining setSupervised learning...
  • We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.
    Latent spaceArchitectureGenerative Adversarial NetPath lengthRegularizationHyperparameterStatisticsAttentionClassificationManifold...
  • The fuzzy Dark Matter (FDM) is one of the recent models for dark matter. According to this model, dark matter is made of very light scalar particles with considerable quantum mechanical effects on the galactic scales which solves many problems of the Cold Dark Matter (CDM). Here we use the observed data from the rotation curve of the Milky Way (MW) galaxy to compare the results from FDM and CDM models. We show that FDM adds a local peak on the rotation curve close to the center of the bulge where its position and amplitude depend on the mass of FDM particles. From fitting the observed rotation curve with our expectation from FDM, we find the mass of FDM is $m = 2.5^{+3.6}_{-2.0} \times10^{-21}$eV. We note that the local peak of the rotation curve in MW can also be explained in the CDM model with an extra inner bulge model for the MW galaxy. We conclude that the FDM model explains this peak without need to extra structure for the bulge.
    Fuzzy dark matterMilky WayRotation CurveGalaxyDark matterCold dark matterNavarro-Frenk-White profileLambda-CDM modelRotation curve of the Milky WayDark matter halo...
  • We present MeerKAT 1000 MHz and 1400 MHz observations of a bright radio galaxy in the southern hemisphere, ESO~137-006. The galaxy lies at the centre of the massive and merging Norma galaxy cluster. The MeerKAT continuum images (rms ~0.02 mJy/beam at ~10" resolution) reveal new features that have never been seen in a radio galaxy before: collimated synchrotron threads of yet unknown origin, which link the extended and bent radio lobes of ESO~137-006. The most prominent of these threads stretches in projection for about 80 kpc and is about 1 kpc in width. The radio spectrum of the threads is steep, with a spectral index of up to $\alpha\simeq 2$ between 1000 MHz and 1400 MHz.
    European Southern ObservatoryRadio lobesMeerKATRadio galaxySynchrotronMilky WayIntra-cluster mediumRadio sourcesFull width at half maximumCluster of galaxies...
  • In this paper, we explore illustrations in children's books as a new domain in unpaired image-to-image translation. We show that although the current state-of-the-art image-to-image translation models successfully transfer either the style or the content, they fail to transfer both at the same time. We propose a new generator network to address this issue and show that the resulting network strikes a better balance between style and content. There are no well-defined or agreed-upon evaluation metrics for unpaired image-to-image translation. So far, the success of image translation models has been based on subjective, qualitative visual comparison on a limited number of images. To address this problem, we propose a new framework for the quantitative evaluation of image-to-illustration models, where both content and style are taken into account using separate classifiers. In this new evaluation framework, our proposed model performs better than the current state-of-the-art models on the illustrations dataset. Our code and pretrained models can be found at https://github.com/giddyyupp/ganilla.
    Generative Adversarial NetAblationConvolutional neural networkTraining setTraining ImageArchitectureAttentionNearest-neighbor siteGround truthOptimization...
  • Most existing neural network models for music generation use recurrent neural networks. However, the recent WaveNet model proposed by DeepMind shows that convolutional neural networks (CNNs) can also generate realistic musical waveforms in the audio domain. Following this light, we investigate using CNNs for generating melody (a series of MIDI notes) one bar after another in the symbolic domain. In addition to the generator, we use a discriminator to learn the distributions of melodies, making it a generative adversarial network (GAN). Moreover, we propose a novel conditional mechanism to exploit available prior knowledge, so that the model can generate melodies either from scratch, by following a chord sequence, or by conditioning on the melody of previous bars (e.g. a priming melody), among other possibilities. The resulting model, named MidiNet, can be expanded to generate music with multiple MIDI channels (i.e. tracks). We conduct a user study to compare the melody of eight-bar long generated by MidiNet and by Google's MelodyRNN models, each time using the same priming melody. Result shows that MidiNet performs comparably with MelodyRNN models in being realistic and pleasant to listen to, yet MidiNet's melodies are reported to be much more interesting.
    Convolutional neural networkRecurrent neural networkNeural networkNetwork modelGoogle.comDeep Neural NetworksArchitectureMinimaxMusic information retrievalOptimization...
  • In recent years, neural networks have been used to generate symbolic melodies. However, the long-term structure in the melody has posed great difficulty for designing a good model. In this paper, we present a hierarchical recurrent neural network for melody generation, which consists of three Long-Short-Term-Memory (LSTM) subnetworks working in a coarse-to-fine manner along time. Specifically, the three subnetworks generate bar profiles, beat profiles and notes in turn, and the output of the high-level subnetworks are fed into the low-level subnetworks, serving as guidance for generating the finer time-scale melody components in low-level subnetworks. Two human behavior experiments demonstrate the advantage of this structure over the single-layer LSTM which attempts to learn all hidden structures in melodies. Compared with the state-of-the-art models MidiNet and MusicVAE, the hierarchical recurrent neural network produces better melodies evaluated by humans.
    Recurrent neural networkLong short term memoryNeural networkHuman dynamicsHidden layerTraining setStatisticsAutoencoderOverfittingOptimization...
  • Home design is a complex task that normally requires architects to finish with their professional skills and tools. It will be fascinating that if one can produce a house plan intuitively without knowing much knowledge about home design and experience of using complex designing tools, for example, via natural language. In this paper, we formulate it as a language conditioned visual content generation problem that is further divided into a floor plan generation and an interior texture (such as floor and wall) synthesis task. The only control signal of the generation process is the linguistic expression given by users that describe the house details. To this end, we propose a House Plan Generative Model (HPGM) that first translates the language input to a structural graph representation and then predicts the layout of rooms with a Graph Conditioned Layout Prediction Network (GC LPN) and generates the interior texture with a Language Conditioned Texture GAN (LCT-GAN). With some post-processing, the final product of this task is a 3D house model. To train and evaluate our model, we build the first Text-to-3D House Model dataset.
    GraphNatural languageGenerative modelGround truthEmbeddingMain sequence starAttentionLocal neighbourhoodArchitectureBayesian posterior probability...
  • It has been recently shown that Generative Adversarial Networks (GANs) can produce synthetic images of exceptional visual fidelity. In this work, we propose the GAN-based method for automatic face aging. Contrary to previous works employing GANs for altering of facial attributes, we make a particular emphasize on preserving the original person's identity in the aged version of his/her face. To this end, we introduce a novel approach for "Identity-Preserving" optimization of GAN's latent vectors. The objective evaluation of the resulting aged and rejuvenated face images by the state-of-the-art face recognition and age estimation solutions demonstrate the high potential of the proposed method.
    OptimizationConvolutional neural networkAutoencoderEuclidean distanceGenerative modelNeural networkGround truthDeep learningBinary numberVisual observation...
  • Age progression and regression refers to aesthetically render-ing a given face image to present effects of face aging and rejuvenation, respectively. Although numerous studies have been conducted in this topic, there are two major problems: 1) multiple models are usually trained to simulate different age mappings, and 2) the photo-realism of generated face images is heavily influenced by the variation of training images in terms of pose, illumination, and background. To address these issues, in this paper, we propose a framework based on conditional Generative Adversarial Networks (cGANs) to achieve age progression and regression simultaneously. Particularly, since face aging and rejuvenation are largely different in terms of image translation patterns, we model these two processes using two separate generators, each dedicated to one age changing process. In addition, we exploit spatial attention mechanisms to limit image modifications to regions closely related to age changes, so that images with high visual fidelity could be synthesized for in-the-wild cases. Experiments on multiple datasets demonstrate the ability of our model in synthesizing lifelike face images at desired ages with personalized features well preserved, and keeping age-irrelevant regions unchanged.
    AttentionRegressionGenerative Adversarial NetArchitectureTraining ImageGAN-based modelDeep learningGraphGround truthOptimization...
  • Face aging is of great importance for cross-age recognition and entertainment-related applications. Recently, conditional generative adversarial networks (cGANs) have achieved impressive results for face aging. Existing cGANs-based methods usually require a pixel-wise loss to keep the identity and background consistent. However, minimizing the pixel-wise loss between the input and synthesized images likely resulting in a ghosted or blurry face. To address this deficiency, this paper introduces an Attention Conditional GANs (AcGANs) approach for face aging, which utilizes attention mechanism to only alert the regions relevant to face aging. In doing so, the synthesized face can well preserve the background information and personal identity without using the pixel-wise loss, and the ghost artifacts and blurriness can be significantly reduced. Based on the benchmarked dataset Morph, both qualitative and quantitative experiment results demonstrate superior performance over existing algorithms in terms of image quality, personal identity, and age accuracy.
    AttentionGenerative Adversarial NetClassificationGround truthApplication programming interfaceAutoencoderTraining ImageArchitectureManifoldMinimax...
  • Deep Convolutional Neural Networks (CNN) have drawn great attention in image super-resolution (SR). Recently, visual attention mechanism, which exploits both of the feature importance and contextual cues, has been introduced to image SR and proves to be effective to improve CNN-based SR performance. In this paper, we make a thorough investigation on the attention mechanisms in a SR model and shed light on how simple and effective improvements on these ideas improve the state-of-the-arts. We further propose a unified approach called "multi-grained attention networks (MGAN)" which fully exploits the advantages of multi-scale and attention mechanisms in SR tasks. In our method, the importance of each neuron is computed according to its surrounding regions in a multi-grained fashion and then is used to adaptively re-scale the feature responses. More importantly, the "channel attention" and "spatial attention" strategies in previous methods can be essentially considered as two special cases of our method. We also introduce multi-scale dense connections to extract the image features at multiple scales and capture the features of different layers through dense skip connections. Ablation studies on benchmark datasets demonstrate the effectiveness of our method. In comparison with other state-of-the-art SR methods, our method shows the superiority in terms of both accuracy and model size.
    AttentionConvolutional neural networkAblationArchitectureFeature extractionScale factorSub-pixelInformation flowImage ProcessingDeep learning...