Generating Synthetic GW data

Over time, machine learning models have found increasing use in the field of noise removal from picture or time-series data. In order to find the hidden signals beneath the detector noise, I intended to test ML models on GW data analysis. For this, I intended to create a synthetic GW data (GW injections) and add them to the detector noise as well as the Gaussian noise. Then i will test the model on real time LIGO data around detected O1 + O2 +O3 events.
Q: For generating the synthetic data i am facing the issues i have referred the few papers and GitHub repositories i have find few links related to this. the given below repository is good but there are many difficulties to implement those codes and few off the libraries are deprecated form python now.
ggwd .

anyone here who did the similar work related to this, please help me to generate the synthetic data to carry on my work further. Thanks in advance

1 Like

@sanjeev7881 Great question!

Here are a few suggestions:

You could start with the data set generated for the GW Kaggle competition. I checked, and you can still download the dataset, event though the competition is over:

Or use pyCBC:

Or use bilby:

Good luck!

Or, if it helps, here is a short example script that I use for this:

import pycbc.noise
import pycbc.psd
import pylab
from pycbc import frame

# -- Excercise 2
# -- Find signal in colored Guassian noise

fs = 4096

# -- Make some noise
# The color of the noise matches a PSD which you provide
flow = 10.0
delta_f = 1.0 / 16
flen = int(2048 / delta_f) + 1
psd = pycbc.psd.analytical.aLIGOZeroDetHighPower(flen, delta_f, flow)

# Generate  noise at 4096 Hz
delta_t = 1.0 / fs
tsamples = int(128 / delta_t)
ts = pycbc.noise.noise_from_psd(tsamples, delta_t, psd, seed=127)

# -- Make a BBH waveform
hp, hc = get_td_waveform(approximant="SEOBNRv4_opt",

# -- taper
hp = pycbc.waveform.utils.td_taper(hp, -0.5, -0.4, beta=8, side='left')
hp = zeropad(hp, len(ts), -15)
hp.start_time = ts.start_time

#-- Put the signal in the data
data_set = ts+hp
frame.write_frame("./datafile.gwf", "H1:SIMULATED", data_set)

Thanks a bunch Dr. Jonah, I am really glad for your hand to help me out. Finally i have found few best way to generate synthetic GW data. Recently i read an article that says instead of injecting data in the Gaussian noise, it’s better to inject into the detectors (which contains either detectors noise or detectors noise+glitches and it has exact PSD range that our real events contain ), Simply realistic noise. So i want to download detectors noise where there is no GW signals present, where one can download this (detectors wise)?
“zeropad” in the above code is belongs to which library?


If you want to download strain data from previous runs, you can download from the gwosc website. Navigate to the run you want to download from, pick 4 or 16 KHz sampling rate data, and then choose a time interval, a detector and you should end up in an “Archive” page with a list of download links. Let me know if you need more detailed steps. Files are about 120 MB each.

1 Like

Thanks for the reply, if i download the strain data form the GWOSC website there might be a chance that the downloaded data contains GW signals, right?. Is this possible to download the strain data without GW signals? how to adjust the time interval accordingly to download it?

There is no strain data without the signals, as we don’t remove them when making them public.
But all the detections were made public in the catalogs, so you could check against the catalogs and avoid using times too close to a known detection.

When I was training my neural network with data, something went wrong, so I checked the data generation process. I found one thing in the generated data M1 is less than in M2 (M1<M2. Not in all samples). In actuality, the primary mass should be greater than the secondary mass (M1>M2), right? Is this okay if M1 is less than M2?

Generally, the parameter space (prior) of the training set is predetermined, similar to the usual Bayesian statistical inference. The definition of m1 is required to be relatively larger, i.e., m1 > m2. If you do not impose this constraint, it will make the parameter distribution in your training set unpredictable and uncontrollable.

1 Like