Datasets and Code

  • Misinformation Detection During High Impact Events: An Application to COVID-19

  • Social media has become an important communication channel during high impact events, such as the COVID-19 pandemic. As misinformation in social media can rapidly spread, creating social unrest, curtailing the spread of misinformation during such events is a significant data challenge. We present a labeled COVID-19 Twitter dataset based on socio-linguistic criteria that can be used for the study of misinformation during COVID-19 from both a machine learning as well as computational lingusitic perspective.

    COVID-19 Twitter Data Data

    C. Moroney, E. Crothers, S. Mittal, A. Joshi, T. Adali, C. Mallinson, N. Japkowicz, Z. Boukouvalas, "The Case for Latent Variable Vs Deep Learning Methods in Misinformation Detection: An Application to COVID-19"
  • Exploiting sparsity and statistical dependence in multivariate data fusion: an application to misinformation detection for high-impact events

  • Social media has become crucial for communication during events like natural disasters and terrorist attacks, but misinformation spreads quickly, affecting decision-making. Detecting misinformation in multi-modal social media data is challenging. This paper proposes novel algorithms based on multi-modal latent variable modeling to address this challenge and demonstrates their effectiveness using simulated and real-world datasets of tweets from high-impact events.

    Code and Data: For access to the code and dataset for the following two papers, please reach out to Zois Boukouvalas (boukouva@american.edu).

    Lucas P. Damasceno, Egzona Rexhepi, Allison Shafer, Ian Whitehouse, Nathalie Japkowicz, Charles C. Cavalcante, Roberto Corizzo, Zois Boukouvalas, "Exploiting sparsity and statistical dependence in multivariate data fusion: an application to misinformation detection for high-impact events"

    Lucas P. Damasceno, Allison Shafer, Nathalie Japkowicz, Charles C. Cavalcante, Zois Boukouvalas, "Efficient Multivariate Data Fusion for Misinformation Detection During High Impact Events"

Matlab Code

  • Independent Vector Analysis using Semi-Parametric Density Estimation via Multivariate Entropy Maximization

  • IVA-M-EMK and M-EMK algorithms IVA-M-EMK

    References

    [1] L. P. Damasceno, C. C. Cavalcante, T. Adali, and Z. Boukouvalas, "Independent Vector Analysis using Semi-Parametric Density Estimation via Multivariate Entropy Maximization" ICASSP 2021
  • Sparse ICA: Independence Vs Sparsity

  • For a given dataset, BSS provides useful decompositions under minimum assumptions typically by making use of statistical properties---forms of diversity---of the data. Two popular forms of diversity that have proven useful for many applications are statistical independence and sparsity. Although many methods have been proposed for the solution of the BSS problem that take either the statistical independence or the sparsity of the data into account, there is no unified method that can take into account both forms of diversity simultaneously. The proposed algorithm, SparseICA by entropy bound minimization (SparseICA-EBM), inherits all the advantages of ICA by entropy bound minimization (ICA-EBM), namely its flexibility, though with enhanced performance due to the exploitation of the sparsity of the underlying sources (when they are indeed sparse) and enables direct control over the degree to which independence and sparsity are emphasized.

    Sparse ICA by entropy bound minimization SparseICA-EBM

    References

    [1] Z. Boukouvalas, Y. Levin-Schwartz, Vince D. Calhoun, and T. Adali, "Sparsity and Independence: Balancing of two Objectives in Optimization for Source Separation with Application to fMRI Analysis," Elsevier, Journal of the Franklin Institute (JFI), Engineering and Applied Mathematics, 2017.

    [2] Z. Boukouvalas, Y. Levin-Schwartz, and T. Adali, "Enhancing ICA Performance By Exploiting Sparsity: Application to fMRI Analysis," In the proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, March 2017 pp 2532 - 2536.
  • Multivariate Generalized Gaussian Distribution (MGGD) and Parameter Estimation

  • Multivariate generalized Gaussian distribution (MGGD) has been an attractive solution to many signal processing problems due to its simple yet flexible parametric form, which requires the estimation of only a few parameters, i.e., the scatter matrix and the shape parameter. We present the code for generating realizations from the MGGD as well as estimating its parameters [1]. The MGGD can be characterized using two parameters, the scatter matrix and the shape parameter. If the shape parameter is less than 1 the distribution of the marginals is super-Gaussian (i.e. more peaky, with heavier tails) and if the shape parameter is greater than 1, the distribution of the marginals is sub-Gaussian (i.e., less peaky with lighter tails). If shape parameter is equal to 1, then we generate multivariate Gaussian sources.

    MGGD generation and parameter estimation MGGD-Generation-Estimation

    References

    [1] Z. Boukouvalas, S. Said, L. Bombrun, Y. Berthoumieu, and T. Adali, " A new Riemannian averaged fixed-point algorithm for MGGD parameter estimation," IEEE Signal Proc. Letts., vol. 22, no. 12, pp. 2314-2318, Dec. 2015.
  • Independent Vector Analysis with Adaptive MGGD (IVA-A-GGD)

  • Due to each flexibility, MGGD provides an effective model for IVA. Modeling the latent multivariate variables--sources--the performance of the IVA algorithm highly depends on the estimation of the source parameters. We present two different IVA-A-GGD algorithms that estimate the shape parameter and scatter matrix jointly, while taking both SOS and HOS into account. The first algorithm is based on a Fisher scoring (FS) algorithm [1] (IVA-A-GGD-MLFS) and the second on a fixed point (FP) algorithm [2] IVA-A-GGD-RAFP.

    IVA-A-GGD algorithms IVA-A-GGD

    References

    [1] Z. Boukouvalas, G.-S. Fu, and T. Adali, "An efficient multivariate generalized Gaussian distribution estimator: Application to IVA," in Proc. Conf. on Info. Sciences and Systems (CISS), Baltimore, MD, March 2015.

    [2] Z. Boukouvalas, S. Said, L. Bombrun, Y. Berthoumieu, and T. Adali, " A new Riemannian averaged fixed-point algorithm for MGGD parameter estimation," IEEE Signal Proc. Letts., vol. 22, no. 12, pp. 2314-2318, Dec. 2015.

Python Code

  • Independence Vector Analysis in Python

  • pyiva is a python package which implements the independent vector analysis (IVA) using a multivariate Laplace prior.

    pyIVA