Settings for population analysis


I have recently been through the process of reproducing the population analysis of GWTC-3, and it took me quite a long time to figure out the settings to use in the analysis.
These include the details of the selection function computation, ensuring a minimum number of samples per event when performing the Monte-Carlo integration, or more importantly the exact PE samples that were used in the population analysis. In the end I was told that those could be found in the GWTC-3 population data release, but this was not clearly indicated.
I believe it would be helpful to mention somewhere these points and in general the details of the analysis in order to facilitate the reproducibility of the results.

Thank you for your attention and all the efforts!



1 Like

Hi @alextoub7! Thank you for the feedback – analysis reproducibility is definitely very important and is something that we can always strive to improve!

In case it’s useful, this repo hosts my own workflow for collecting and preprocessing PE samples and injections: GitHub - tcallister/get-lvk-data. It hopefully illustrates exactly what samples are used for which events. Also just a disclaimer that every analyst will have their own slightly different workflow, although I believe the various analyses across the GWTC-3 populations paper should all be using these same samples unless otherwise noted.

Definitely let me know if you have any further questions about the population analyses, and I (or others!) can hopefully try to help!

Hi @alextoub7,

Thanks for this suggestion! The article is currently being revised as part of the journal’s peer review process and the published version will include an explicit pointer to this data release in the introduction.

In addition to the repo that Tom linked, I’d also like to point you to the following released codes that were used for some of the analyses in this work. In them, you can find such settings as the exact minimum effective MC sample requirement, and how the public injection sets were used to calculate the selection function. The most recent tags/releases for both correspond to the versions used in the GWTC-3 populations paper.

Please let us know if you have any more questions!


Hi Tom,

Thank you very much! I will have a look to understand which was the difference with the samples I was using.
I had also tried using gwpopulation to generate the samples from the PE results I downloaded, and it was in good agreement with the code I wrote myself. However when using these samples to run the population analysis I got a noticeable difference in the mass distribution, the secondary peak in the mass distribution (around 30Mun) was less pronounced than in the LVC results. I was actually getting a more similar result to the LVC analysis when using gwpopulation to generate samples from the PE results with Phenom rather than mixing betwen EOB and Phenom. I was wondering if maybe I was using a newer version of the PE result than what was used for the paper. In any case, when using the samples provided with the GWTC-3 population data release it matches the LVC results very well.



Hi Amanda,

Thank you very much for your suggestions. I have been comparing my code to gwpopulation and gwpopulation_pipe, this i actually how I found out about some “tricks” such as requiring a mimimum effective number of samples when performing MC integration, or the marginalisation over MC uncertainty in the computation of the selection function. I was actually suggesting that it would be nice if some of these points could be explicited somewhere, I believe they are mentioned across the LVC papers, but it requires to go in details through all of them ^^ .