Data-driven astronomy explained

Data-driven astronomy (DDA) refers to the use of data science in astronomy. Several outputs of telescopic observations and sky surveys are taken into consideration and approaches related to data mining and big data management are used to analyze, filter, and normalize the data set that are further used for making Classifications, Predictions, and Anomaly detections by advanced Statistical approaches, digital image processing and machine learning. The output of these processes is used by astronomers and space scientists to study and identify patterns, anomalies, and movements in outer space and conclude theories and discoveries in the cosmos.

History

In 2007, the Galaxy Zoo project[1] was launched for morphological classification[2] [3] of a large number of galaxies. In this project, 900,000 images were considered for classification that were taken from the Sloan Digital Sky Survey (SDSS)[4] for the past 7 years. The task was to study each picture of a galaxy, classify it as elliptical or spiral, and determine whether it was spinning or not. The team of Astrophysicists led by Kevin Schawinski in Oxford University were in charge of this project and Kevin and his colleague Chris Linlott figured out that it would take a period of 3–5 years for such a team to complete the work.[5] There they came up with the idea of using Machine Learning and Data Science techniques for analyzing the images and classifying them.

Methodology

The data retrieved from the sky surveys are first brought for data preprocessing. In this, redundancies are removed and filtrated. Further, feature extraction is performed on this filtered data set, which is further taken for processes.[6] Some of the renowned sky surveys are listed below:

The size of data from the above-mentioned sky surveys ranges from 3 TB to almost 4.6 EB. Further, data mining tasks that are involved in the management and manipulation of the data involve methods like classification, regression, clustering, anomaly detection, and time-series analysis. Several approaches and applications for each of these methods are involved in the task accomplishments.

Classification

Classification[15] is used for specific identifications and categorizations of astronomical data such as Spectral classification, Photometric classification, Morphological classification, and classification of solar activity. The approaches of classification techniques are listed below:

Regression

Regression is used to make predictions based on the retrieved data through statistical trends and statistical modeling. Different uses of this technique are used for fetching Photometric redshifts and measurements of physical parameters of stars.[16] The approaches are listed below:

Clustering

Clustering[17] is classifying objects based on a similarity measure metric. It is used in Astronomy for Classification as well as Special/rare object detection. The approaches are listed below:

Anomaly detection

Anomaly detection[19] is used for detecting irregularities in the dataset. However, this technique is used here to detect rare/special objects. The following approaches are used:

Time-series analysis

Time-Series analysis[20] helps in analyzing trends and predicting outputs over time. It is used for trend prediction and novel detection (detection of unknown data). The approaches used here are:

Notes and References

  1. Web site: Zooniverse . 2024-05-10 . www.zooniverse.org.
  2. Cavanagh . Mitchell K. . Bekki . Kenji . Groves . Brent A. . 2021-07-08 . Morphological classification of galaxies with deep learning: comparing 3-way and 4-way CNNs . Monthly Notices of the Royal Astronomical Society . 506 . 1 . 659–676 . 10.1093/mnras/stab1552 . free . 2106.01571 . 0035-8711.
  3. Goyal . Lalit Mohan . Arora . Maanak . Pandey . Tushar . Mittal . Mamta . 2020-12-01 . Morphological classification of galaxies using Conv-nets . Earth Science Informatics . en . 13 . 4 . 1427–1436 . 10.1007/s12145-020-00526-w . 1865-0481.
  4. Web site: Sloan Digital Sky Survey-V: Pioneering Panoptic Spectroscopy - SDSS-V . 2024-05-10 . en-US.
  5. Web site: Pati . Satavisa . 2021-06-18 . How Data Science is Used in Astronomy? . 2024-05-10 . Analytics Insight . en.
  6. Zhang . Yanxia . Zhao . Yongheng . 2015-05-22 . Astronomy in the Big Data Era . Data Science Journal . 14 . 11 . 10.5334/dsj-2015-011 . free . 2015DatSJ..14...11Z . 1683-1470.
  7. Web site: The Palomar Digital Sky Survey (DPOSS) . 2024-05-10 . sites.astro.caltech.edu.
  8. Web site: IRSA - Two Micron All Sky Survey (2MASS) . 2024-05-10 . irsa.ipac.caltech.edu.
  9. Web site: 2023-06-26 . GBT . 2024-05-10 . Green Bank Observatory . en-US.
  10. Web site: GALEX - Galaxy Evolution Explorer . 2024-05-10 . www.galex.caltech.edu.
  11. Web site: SkyMapper Southern Sky Survey . 2024-05-10 . skymapper.anu.edu.au.
  12. Web site: Pan-STARRS1 data archive home page - PS1 Public Archive - STScI Outerspace . 2024-05-10 . outerspace.stsci.edu.
  13. Web site: Telescope . Large Synoptic Survey . Rubin Observatory . 2024-05-10 . Rubin Observatory . en.
  14. Web site: Explore SKAO . 2024-05-10 . www.skao.int.
  15. Book: Chowdhury . Shovan . Schoen . Marco P. . Research Paper Classification using Supervised Machine Learning Techniques . 2020-10-02 . 2020 Intermountain Engineering, Technology and Computing (IETC) . https://ieeexplore.ieee.org/document/9249211 . IEEE . 1–6 . 10.1109/IETC47856.2020.9249211 . 978-1-7281-4291-3.
  16. Bulletin de la Société Royale des Sciences de Liège PoPuPS . Bulletin de la Société Royale des Sciences de Liège . fr . 0037-9565.
  17. Book: Bindra . Kamalpreet . Mishra . Anuranjan . A detailed study of clustering algorithms . September 2017 . 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO) . https://ieeexplore.ieee.org/document/8342454 . IEEE . 371–376 . 10.1109/ICRITO.2017.8342454 . 978-1-5090-3012-5.
  18. Pizzuti . C. . Talia . D. . May 2003 . P-autoclass: scalable parallel clustering for mining large data sets . IEEE Transactions on Knowledge and Data Engineering . en . 15 . 3 . 629–641 . 10.1109/TKDE.2003.1198395 . 1041-4347.
  19. Thudumu . Srikanth . Branch . Philip . Jin . Jiong . Singh . Jugdutt (Jack) . 2020-07-02 . A comprehensive survey of anomaly detection techniques for high dimensional big data . Journal of Big Data . 7 . 1 . 42 . 10.1186/s40537-020-00320-x . free . 2196-1115. 10536/DRO/DU:30158643 . free .
  20. Book: Handbook of Psychology . 2003-04-15 . Wiley . 978-0-471-17669-5 . Weiner . Irving B. . 1 . en . 10.1002/0471264385.wei0223.