Tutorial Questions - Day 2

Yes dividing the Fourier Transform of your signal by the square root of the PSD is indeed the whitening. The interpolation is done only for matching the array of the original PSD with that of the series to be whitened: indeed, if your time-domain series has N samples, its Fourier Transform is associated with N equally spaced frequencies ranging from -sampling_rate/2 to +sampling_rate/2; your original PSD is likely to have a different number of points, let’s say K frequencies equally spaced between -sampling_rate/2 to +sampling_rate/2, so you have to perform interpolation to have an array matching the N equally spaced frequencies

1 Like

Hi @Eileen-Meyer-UMBC , yes, we are collecting the presentations and we will add them in thinkific

XLAL Error - XLALFrStreamFileOpen (LALFrStream.c:128): No files in stream file cache
XLAL Error - XLALFrStreamFileOpen (LALFrStream.c:128): Invalid argument
XLAL Error - XLALFrStreamCacheOpen (LALFrStream.c:233): Internal function call failed: Invalid argument

RuntimeError Traceback (most recent call last)
in <cell line: 15>()
14 afile = files
15 for afile in files:
—> 16 ts = read_frame(afile, channel_name, start, end)
17 print(“File {}”.format(afile))
18

/usr/local/lib/python3.10/dist-packages/pycbc/frame/frame.py in read_frame(location, channels, start_time, end_time, duration, check_integrity, sieve)
203 None, None, None)
204
→ 205 stream = lalframe.FrStreamCacheOpen(cum_cache)
206 stream.mode = lalframe.FR_STREAM_VERBOSE_MODE
207

RuntimeError: Internal function call failed: Invalid argument
I’m getting this error in Tutorial 2.2. Please suggest how to correct this ?

Hi,

psd = pycbc.psd.aLIGOZeroDetHighPower(flen, delta_f, flow).

Can anybody explain what aLIGOZeroDetHighPower doing? What flow = 10, flen means?

“ts = pycbc.noise.noise_from_psd(data_length*sample_rate, delta_t, psd, seed=127)”. Here also what seed = 127 guessed?
Can you please tell me that how you are guessing those 10, 127 values?

Thank you.

@GBRK5 Usually, an error like that means the software can’t find the file or channel that you requested. You should check that the filename and channel name you used don’t have any typos.

What is the role of

data_whitened = (ts.to_frequencyseries() / psd_td**0.5).to_timeseries()
hp1_whitened = (hp1.to_frequencyseries() / psd_hp1**0.5).to_timeseries() * 1E-21

Hi @AmbicaG , in this lines we take the timeseries data, transform it to the frequency domain, whiten it and transform it back to a timeseries. This is done as it is easier to perform the whitening in the frequency domain. In the second line, the same is done and then we multiply the timeseries by the factor 10^(-21) to create a realistic strain as it would be seen by our detectors. Multiplying the strain by this factor is equivalent to placing the source at 10 Mpc distance from us.

Why is there a division by the square root of the PSD?

Hi @AmbicaG, the square root of the psd is called asd, the amplitude spectral density and is the actual quantity used to whiten the data. The division itself allows to mitigate noise effects. Here some references: Gravitational-wave sensitivity curves | Christopher Berry, Spectral density - Wikipedia and https://arxiv.org/pdf/1408.0740.pdf . Hope this helps

While working on file Tuto_2.3_Signal_consistency_and_significance.ipynb notebook, I noted that, here, we are using pycbc.filter.matched_filter to do matched filtering of data with frequency domain template (generated via pycbc.waveform.get_fd_waveform). Everything works fine.

I noticed previously in Tuto_2.2_Matched_Filtering_In_action.ipynb notebook, we did the same with pycbc.waveform.get_td_waveform i.e., with time domain template.

Wondering if that pycbc.filter.matched_filter can handle both, I tried using time domain template in the 3rd tutorial Tuto_2.3_Signal_consistency_and_significance.ipynb, but turns out that there is no SNR peak at the signal.

Look at the code

hp, _ = get_fd_waveform(approximant="IMRPhenomD",
                         mass1=cmass, mass2=cmass,
                         f_lower=20.0, delta_f=data[ifo].delta_f)
hp.resize(len(psd[ifo]))
#? only 1 template is generated, since delta_f is same for each ifo

# For each observatory use this template to calculate the SNR time series
snr = {}
for ifo in ifos:
    snr[ifo] = matched_filter(hp, data[ifo], psd=psd[ifo], low_frequency_cutoff=20)
    snr[ifo] = snr[ifo].crop(5, 4)

This is what was given in the tutorial notebook file.

Now, what I did was

hp, _ = get_td_waveform(approximant="IMRPhenomD",
                         mass1=cmass, mass2=cmass,
                         f_lower=20.0, delta_t=data[ifo].delta_t)
hp.resize(len(data[ifo]))

# For each observatory use this template to calculate the SNR time series
snr = {}
for ifo in ifos:
    snr[ifo] = matched_filter(hp, data[ifo], psd=psd[ifo], low_frequency_cutoff=20)
    snr[ifo] = snr[ifo].crop(5, 4)

But, now the SNR plot generated this

I’m unsure as to what the situation is.

Hi, I’m not sure I understand the issue of filter “wrapping around the input” in the context of Tutorial 2.2. Can you explain more clearly?

I went through the documentation but didn’t understand what the method inverse_spectrum_truncation does.

I have two more questions:

  1. “We need to account for both the length of the template and 1 / PSD.” I do not understand this either. I know this has to do with the spikes appearing in the SNR plot but perhaps you could explain more clearly what’s going on?
  2. I don’t understand the point of using aligned = (aligned.to_frequencyseries() * snrp).to_timeseries() after aligned /= sigma(aligned, psd=psd, low_frequency_cutoff=20.0). Why are we scaling the template amplitude twice?

In tutorial 2.2, there a line of code for Q-transform: t, f, p = data.whiten(4, 4).qtransform(.001, logfsteps=100, qrange=(8, 8), frange=(20, 512)). What do the two parameters(.001 and logfstep) in the ‘qtransform’ function represent

Okay, I could understand what the problem was

I forgot to do

hp = hp.cyclic_time_shift(hp.start_time)

After doing that step (for the td_waveform), I could get the signal at the exact same GPS time, the SNR value was a bit off (0.00114% change) (checked for Livingston peak only since it was the most significant one).

That is okay, but, to note that after the generation of the td_template, the generated start time was (hp.start_time value) -3.8s, which we shifted to 0s by making cyclic_time_shift.

Since the starting time was just some constant sec, the snr peak should occur at that time delay (from the actual signal time), right? i.e., the actual signal time + (-3.8s). But, somehow, that was not the case found here when I didn’t do the cyclic_time_shift step (the peak was totally out of the data signal, which is 27.75s long, and the signal somewhat lies at the center)!

I have checked with the Tutorial 2.3 Challenge question section, that it happens like that, the peak which is at 104.00341796875s got shifted to 100.20361328125s if I don’t use the
hp = hp.cyclic_time_shift(hp.start_time), and the time difference is -3.7998s, while the hp.start_time is -3.8s (also, the SNR got changed by 1.249%, which is not where we’re interested now, but given just for completeness).

Thank you for your question. The issue of wrapping data is explained in the paragraph above:

“Note the spike in the data at the boundaries. This is caused by the highpass and resampling stages filtering the data. When the filter is applied to the boundaries, it wraps around to the beginning of the data. Since the data itself has a discontinuity (i.e. it is not cyclic) the filter itself will ring off for a time up to the length of the filter.”

The high pass and resampling filters both leave numerical garbage at the beginning and end of the time series. To remove the garbage, the ends of the data are trimmed after filtering, like this:

# Remove 2 seconds of data from both the beginning and end
conditioned = strain.crop(2, 2)

Removing 2 seconds of data from the ends after filtering removes the numerical garbage that is created by the filtering process.

These “wrapping” features are a common issue in signal processing. The Fourier Transform assumes that time-series data are periodic. So, anytime time-series data are transformed into the frequency domain, it can introduce a discontinuity at the beginning and end of the segment.

Great question! The explanation is in the comment above the line of code. The method is removing all content below 15 Hz. This is important for LIGO data, because the data are not meaningful at very low frequencies.

# 1/PSD will now act as a filter with an effective length of 4 seconds
# Since the data has been highpassed above 15 Hz, and will have low values
# below this we need to inform the function to not include frequencies
# below this frequency.

@AmbicaG For question 2., the answer is that we are scaling the template to the correct amplitude in two steps.

The first step divides out the amplitude of the template, so that it is scaled to SNR=1

The second step multiplies by the recovered SNR, so that the template will have the same SNR as the signal in the data.

1 Like

How is the filter length determined?

@AmbicaG The division by the square root of the PSD flattens the response of the detector across the sensitivity band (hence the name whitening, it means no frequency dependence). You can also think it in another way: the GW detectors are less sensitive at lower and higher frequencies, the division by sqrt(PSD) downweights these frequencies in the data. I hope this makes sense!

1 Like