Conflict on FARs between the papers and data products in zenodo

Hi! I recently studied the publication data products in GWTC2.1/3 and found that quite a lot of the meta data on the false alarm rate (GstLAL/PyCBC/PyCBC-BBH) in Zenodo is in conflict with the publications. I don’t know why, and maybe there is something wrong with my naive codes.

The following is a showcase of the data products (zenodo) for PyCBC-BBH (pycbc_highmass) search pipelines comparing with GWTC-3 paper (only the data under the blue line is consistent here)
FYI, I also pasted the codes for these results.

import os
import numpy as np
import pandas as pd
from tqdm import tqdm

import fnmatch
from gwpy.table import Table
ffname = lambda path, pat: [filename for filename in os.listdir(path) if fnmatch.fnmatch(filename, pat)]

# GWTC3 pycbc_highmass
addr = os.path.join('/Users/herb/Github/VisibleGWevents/GWTC3_triggers/search_data_products/', 'pycbc_highmass')
for i, name in enumerate(tqdm(ffname(addr, '*.xml'))):
    # sngl_inspiral
    table = Table.read(os.path.join(addr, name), tablename="sngl_inspiral")
    columns = ['ifo', 'mass1', 'mass2', 'mchirp', 'mtotal', 'spin1z', 'spin2z', 
               'snr', 'chisq', 'end_time', 'end_time_ns', 'coa_phase', 'template_duration']
    temp = pd.DataFrame(table.values(), index=table.colnames).T[columns]
    df_sngl = temp[['mass1', 'mass2', 'mtotal', 'spin1z', 'spin2z', 
                    'template_duration']].iloc[:1]
    for ifo in temp.ifo:
        df_sngl[f'snr_{ifo}'] = temp[temp.ifo==ifo].snr.iloc[0]
df_sngl[f'chisq_ifo'] = temp[temp.ifo==ifo].chisq.iloc[0]
        df_sngl[f'coa_phase_{ifo}'] = temp[temp.ifo==ifo].coa_phase.iloc[0]

    # coinc_inspiral
    table = Table.read(os.path.join(addr, name), tablename="coinc_inspiral")
    table['ifos', 'mchirp', 'mass', 'snr', 'end_time', 'end_time_ns', 'combined_far', 'minimum_duration']
    df_coinc = pd.DataFrame(table.values(), index=table.colnames). T
    df_temp = pd.merge(df_sngl, df_coinc, left_index=True, right_index=True)
    if i==0:
        GWTC3_pycbc_highmass = df_temp.copy()
    else:
        GWTC3_pycbc_highmass = GWTC3_pycbc_highmass.append(df_temp)

GWTC3_pycbc_highmass.loc[:,'time'] = GWTC3_pycbc_highmass.end_time + GWTC3_pycbc_highmass.end_time_ns*1e-9
# Get events table from GWOSC
from gwpy.table import EventTable
catalogs = [#"GWTC-1-confident", "GWTC-2", "GWTC-2.1-confident", 
            "GWTC-3-confident",
            "GWTC-3-marginal", 'GWTC-2.1-marginal', 'GWTC-1-marginal', 'GWTC-2.1-auxiliary'][:1]

event_table = {}
for catalog in tqdm(catalogs):
    event_table[catalog] = EventTable.fetch_open_data(
        catalog,
        columns=("name", "mass_1_source", "mass_2_source", "GPS",
                 "network_matched_filter_snr", "luminosity_distance", "p_astro"),
    ).to_pandas()

event_table = pd.concat([event_table[catalog] for catalog in catalogs])
print('event name\t\t gps in table \t gps in meta \t\t far',)
for idx in range(len(event_table)):
    ename = event_table.name.iloc[idx]
    gps_in_table = event_table.GPS.iloc[idx]
    series = GWTC3_pycbc_highmass [GWTC3_pycbc_highmass.time.map(lambda tc: (tc >= gps_in_table-0.1) & (tc <= gps_in_table+0.1))]
    if len(series) == 1:
        far = series.false_alarm_rate.iloc[0]
        gps_in_meta = series.time.iloc[0]
    elif len(series)==0:
        far = '\tNone\t'
        gps_in_meta = 'None\t'
    else:
        raise
    print(ename,'\t', gps_in_table, '\t', gps_in_meta,'\t',far)
1 Like

@IPhysReserch Thank you for the great question!

I took a look at the things you’ve shared here. While I haven’t figured out exactly what’s going on, there are a few things that caught my eye:

When I look at the Jupyter notebook posted with the zenodo data release, it looks like the combined_far column is in units of Hz, where the paper is in units of per year. So, to compare these numbers, there would need to be some conversion factor. I don’t see that conversion in the code you shared (did I miss it?). When I look at the output you shared, it looks like the results are consistently different by a factor of around 14.2. I wonder if the conversion factor you used from seconds to years has a typo that could account for this? Anyway, that’s my first guess. If needed, I can take a closer look later this week.

1 Like

@IPhysReserch I was able to reproduce the published result for the first event on your list using the zenodo data. Here’s the code I used:

from gwpy.table import Table

dirname = 'search_data_products/pycbc_highmass/'
fn  = dirname + 'H1L1-PYCBC_HighMass-1256779567-1.xml'
tbldata = Table.read(fn, tablename="coinc_inspiral")
conversion = 3600*24*365  # seconds to years
far = tbldata['combined_far']*conversion
print(far)

This yields the following output:

combined_far   
-------------------
0.45796341399568324

This FAR value seems to match both the first line in the published PDF table and the value shown on the GWOSC Event Portal. As my best guess, you may want to check the conversion factor from seconds to years in the code you are using.

Good luck!

1 Like

@jonah Thanks a lot! I missed the convention, but now I can reproduce the value in the published reports.
:smiley:

1 Like