The backbone underlying the content on the website comes from a general set of “best principles” that should be applied in genomics studies, irrespective of the specific sequencing method used. Instead of providing rigid “best practices”, the website promotes “best principles”, defined as a guiding set of values and goals that can be tailored to one’s hypotheses and guided by one’s data, as opposed to a specific set of instructions. Motivated by rigor and reproducibility, “best principles” are designed to encourage data exploration and critical thinking during analysis and evaluation instead of ticking off boxes on a protocol or step-list. These principles are outlined below.
Goals of study should be chosen before choosing the best sequencing approach, which will inform the total number of samples and sequencing coverage needed: e.g., PoolSeq requires larger sample sizes and deeper coverage given the lack of individual genotyping (Guirao-Rico and González 2021).
Deepen the interpretation of results and flag sources of error throughout a workflow by plotting data such as (i) read-quality metrics pre- and post-filtering, (ii) sequence coverage across a reference and across samples, (iii) principal component analysis of replicates pre- and post-filtering, and (iv) results and predictions of statistical tests.
This issue is particularly acute in non-model species. Some quantitative approaches towards evaluating methods include (i) comparing the performance of different methods or parameter choices using simulated data (Lotterhos, Fitzpatrick, and Blackmon 2022), (ii) measuring their predictive strengths using model selection statistics [(Hooten and Hobbs 2015), (Johnson and Omland 2004)], and (iii) observed-predicted plots from model outputs. A basic understanding of the sensitivity of inference in different analyses will be helpful for determining how robust the results are to nuanced decisions, especially for non-model organisms or unique experimental designs.
Any and all metadata that can be reported should accompany sequence data in databases such as NCBI or SRA. Data on Dryad or GitHub should crosslink to NCBI/SRA.
Use text-annotated code notebooks for bioinformatic analyses (e.g., Rmarkdown, Jupyter).
Provide these notebooks (in Rmarkdown or Jupyter) in a publically accessible format on services such as GitHub, GitLab, Dryad, Figshare, or Zenodo.