Gravity spy time domain glitch data generation

Hi, I am recently dealing with some data analysis problems, which need to obtain glitch data. But it seems that some problems occur in the data I am dealing with.
How I get data:
There’s a csv file named ‘trainingset_v1d1_metadata.csv’, which recorded glitch event time, start time, duration, label, and other information.
I use the event time of glitches, and gwpy function get_urls to download the GWOSC files that contain those glitches.
And then select the time series from the files.

The problem:
It seems that csv files’ information is not reliable:

  1. Some glitches falls into the data constructed with nan points, or to say the data quality of some glitches is 0( data quality flag is a 0 to 127 value, representing 7 channels’ situations). The glitches with the label “helix” in O1, L1 are all in nan segments.
  2. Some glitches are not visible in start time to start time + duration, but in start time to start time + 2* duration. I have no idea what’s going on here.

Can any familiar with Gravity Spy tell me how they obtain the glitch time domain data? And are there any other files except for one obtained by get_urls( of gwpy) that contains more information (some kind of raw data, IDK)?

Hi! Thank you for the question.

I would guess you are looking at the data set 10.5281/zenodo.1476550

I don’t have much experience with this data set, but, if some of the times are outside of observing mode, then it is possible time-domain data for those times are not publicly available.

For the start time, I noticed that the spreadsheet has columns both for start_time and start_time_ns. To get the start time of the glitch, I think you need to calculate:

start = start_time + 1e-9*start_time_ns

I wonder if that’s what’s causing some glitches to be outside the window? It might help if you provide some specific examples of this.

Good luck!

1 Like

@BarryG A colleague pointed me to this more recent Gravity Spy data set, which might be helpful for your study.

1 Like

@jonah Thanks for your patience, Jonah. I tried the new gravity spy dataset, but the problem remains unchanged. A specific example can be shown in this code

First I used the code that gravity spy provided to obtain the information of those glitches. I selected the ones with the label “Helix”

from gwpy.table import GravitySpyTable
L1_O1 = GravitySpyTable.read('../csv/L1_O1.csv')
selected = L1_O1[(L1_O1["ml_label"] == "Helix") & (L1_O1["ml_confidence"] > 0.9)]
selected[0]['start_time']
test_startTime = selected[0]['start_time']+selected[0]['start_time_ns']*1e-9
test_duration = selected[0]['duration']

After that, I use gwpy function get_urls to obtain the files that contain the glitch (which are 4096 seconds files). And use the test_startTime and test_duration to see what’s going on

    strain = TimeSeries.read(datadir + fn, format='hdf5.gwosc')
    t0 = strain.t0.value
    glitch_startTime = test_startTime-t0
    glitch_endTime = glitch_startTime+test_duration
    glitch_startSample = int(glitch_startTime*4096)
    glitch_endSample = int(glitch_endTime*4096)
    glitch_val = strain.value[glitch_startSample:glitch_endSample]
    print(glitch_val)

After doing so I still find that glitch values are all nan, which is pretty confusing.

@BarryG You can see lists of times with available data via the Timeline App.

For example, for O3a, see the link below.

Times outside of these segment lists do not have public data available, because the detectors were not in a good operating state. Unavailable times are represented as NaNs. My advice is that you should check the time of each glitch against this segment list, before trying to download the data, and ignore any glitches where data are not available.

Thanks for reminding me of that! My previous example is not proper, sorry!

For example, we’re looking at the row 46 in File L1_O1.csv, in Gravity Spy Machine Learning Classifications of LIGO Glitches from Observing Runs O1, O2, O3a, and O3b
The Helix with event time = 1128421735.72266, start_time = 1128421735 , start_time_ns = 703125000, duration = 0.132809996604919
If we use the start_time+start_time_ns*1e-9 as the start time and use the duration to choose the time series, we will find that this time series falls into NaNs.