Request for fully reproducible calibration chain for GWOSC strain data

Dear GWOSC team,

I am an external researcher working with GWOSC data and recently asked @LIGO on Mastodon about the calibration of the public strain h(t). They kindly referred me to this contact form.

I have read the O4a open-data paper (arXiv:2508.18079) and the main calibration papers (Cahillane 2017, Sun 2020/2021 etc.). They describe the method and the uncertainty envelopes, but from an outside perspective the “best calibration version available” still remains a black box: external users cannot reconstruct h(t) from the raw control and photon-calibrator signals.

Concretely, I would like to know:

  1. Are there plans to release, as open data,
    – the relevant PCal channels,
    – the DARM error / control signals, and
    – the exact time-dependent filter definitions (FIR/IIR coefficients and switch-over epochs)
    so that public h(t) can be reproduced from scratch?

  2. If such products already exist (for example as internal “calibration frames” or detailed CalEnv files), what would be the correct way for external groups to request access?

To show that this is not a purely theoretical question: I am actively testing independent models against observational data (ESO/ALMA, AKARI, NED etc.), see for example:
https://github.com/error-wtf/Segmented-Spacetime-Mass-Projection-Unified-Results/blob/main/reports/full-output.md

Even a short answer like “no, we do not plan to release the full calibration chain” would already be very helpful, because it clarifies what level of reproducibility is realistically achievable for external teams.

Thank you very much for your time and for providing the GWOSC data to the community.

Best regards,

Lino Casu

└─# python fetch_ligo.py --channel H1:DCS-CALIB_STRAIN_CLEAN_C01_AR --start 1240583610 --duration 100 --out h1_o3_clean_c01_1240583610_100.hdf5
Fetching channel: H1:DCS-CALIB_STRAIN_CLEAN_C01_AR
From 1240583610.0 to 1240583710.0 (GPS)
Host: losc-nds ligo org

and

└─# python fetch_ligo.py --channel H1:DCS-CALIB_STRAIN_CLEAN_C01_AR --start 1240583610 --duration 100 --out h1_o3_clean_c01_1240583610_100.hdf5
Fetching channel: H1:DCS-CALIB_STRAIN_CLEAN_C01_AR
From 1240583610.0 to 1240583710.0 (GPS)
Host: nds gwosc org

Does both not work.

What is wrong with my script?

Hi @Lino ,

Thank you for your question. No, at this time, we do not have plans to release the data for DARM, p-cal, or calibration models.

To make a request for additional data, we ask requestors to write a technical note with the following information:

  • What data are being requested and why?
  • What is the motivation for a public data release?
  • What is the time range for the data release?
  • What channels are needed?

You can see an example request in the DCC.

I would be happy to look into a request like this. But, my understanding is that the calibration chain includes a number of inputs, and it may be a challenge to identify, document, and release all the relevant pieces with the person-power available.

I recommend checking the host name.

An example to access data on NDS2 is here:

from gwpy.timeseries import TimeSeries
data = TimeSeries.fetch('H1:DCS-CALIB_STRAIN_CLEAN_C01_AR', start=1240559616, end=1240559626, host='nds.gwosc.org')

Hi Jonah,

thank you again for your help and for pointing me to the O3 alternate calibration channel.

After finally managing to fetch a short segment from
H1:DCS-CALIB_STRAIN_CLEAN_C01_AR via NDS2, I inspected the actual HDF5 file I obtained (for 1 s at GPS 1240559616):

  • The file contains one single dataset named H1:DCS-CALIB_STRAIN_CLEAN_C01_AR of shape (16384,), float64 – i.e. 1 second at 16384 Hz.

  • The only attributes attached to this dataset are essentially:

    • x0 (GPS start time),

    • dx (sample spacing),

    • unit = “strain”,

    • name / channel.

  • There are no additional groups or datasets and no richer metadata:
    no DARM error/control signals,
    no p-cal data,
    no FIR/IIR filter definitions or coefficients,
    no time-dependent correction factors,
    no explicit calibration version or uncertainty model.

In other words: what I get in the “O3 Alternate Calibration Release” HDF5 file is, in practice, just a 1D array of calibrated strain values h(t) with minimal timing information – scientifically indistinguishable from the regular GWOSC strain product. There is nothing in the file that would allow me to:

  • reconstruct h(t) from more primitive inputs, or

  • compare different calibration models on the basis of their ingredients.

From an external user’s perspective, this feels very misleading:

  • The wording “Alternate Calibration Release via GWpy” and the dedicated documentation strongly suggest that there is some substantially richer calibration information available.

  • In reality, what is released looks like exactly the same kind of black-box end product as the standard strain, just with a different calibration already baked in and no way to inspect how it was built.

Given that, I honestly do not understand the purpose of advertising this as an “alternate calibration release”. If the community only ever sees finished h(t) arrays with a label that says “this was calibrated differently”, but never the underlying inputs or filter chain, it is impossible for independent teams to truly reproduce or critically test the calibration.

I appreciate your personal effort in answering my questions, but I find the overall situation very frustrating: as someone trying to do careful, independent analysis, the data that are presented as “open” and even as “alternate calibration” remain, in practice, opaque end products. The terminology and tutorials give the impression of transparency, while the actual files do not contain the information one would normally associate with a calibration release.

Best regards,
Lino

Thank you for your helpful feedback. I agree that’s what currently released is not enough information to reproduce calibration. To request additional channels are released, I recommend writing a technical note with the details and motivation for your request, as described above.

I also agree the NDS2 server is slow today. I will try to understand why. Pulling data from NDS2 requires an explicit list of which channels to access. At this time, there are 3 versions of the LIGO strain channel available, listed here:

O4a was released with a richer frame file, that contains a variety of meta-data. I don’t think its enough to reproduce calibration, but it is exactly the frame files used for LVK analysis. I would encourage you to look at these, as well.