<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>doc</title>
        <description>Documentation for BEAST X v10</description>
        <link>http://github.com/beast-dev/</link>
        <atom:link href="http://github.com/beast-dev/feed.xml" rel="self" type="application/rss+xml"/>
        <pubDate>Sun, 08 Mar 2026 16:40:58 +0000</pubDate>
        <lastBuildDate>Sun, 08 Mar 2026 16:40:58 +0000</lastBuildDate>
        <generator>Jekyll v3.10.0</generator>
        
        <item>
            <title>Some notes on pattern compression and speeding up BEAST analysis</title>
            <description>&lt;h3 id=&quot;pattern-compression-for-fun-and-profit&quot;&gt;Pattern compression for fun and profit&lt;/h3&gt;

&lt;p&gt;by Andrew Rambaut &amp;amp; Philippe Lemey&lt;/p&gt;

&lt;p&gt;To calculate the likelihood of a tree given a sequence alignment, the log likelihood of each site (column in the alignment) is evaluated independently and then summed across all sites in the alignment. The likelihood of any sites that have exactly the same nucleotides for each taxon will be the identical (assuming they have been assigned the same substitution model and rate). So, in practice, BEAST ‘compresses’ the alignment into a set of unique patterns of nucleotides (referred to as just ‘patterns’) and the count of how many times they occur. Thus, the likelihood for the alignment is the sum of the likelihood for each pattern times their counts. Obviously, this trick provides considerable speed improvements over just naively calculating the likelihood for each site independently and is used by most (all?) phylogenetic software.&lt;/p&gt;

&lt;p&gt;In the latest version (v10.5.0), BEAUti now shows the number of unique site patterns for a loaded alignment as well as the original length of the alignment:&lt;/p&gt;

&lt;figure style=&quot; width: 320;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/Patterns_BEAUti.png&quot; alt=&quot;&quot; style=&quot;max-width: 320&quot; /&gt;&lt;figcaption&gt;Figure 1 | Pattern count in BEAUti data partition table.&lt;/figcaption&gt;&lt;/figure&gt;

&lt;p&gt;In this example using the file &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WNV.fasta&lt;/code&gt; from the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;examples/Data&lt;/code&gt; folder shows that this alignment of West Nile Virus genomes is compressed from over 11,000 nucleotides to only 727 unique site patterns - a compression of over 15 times with a commensurate improvement in likelihood calculation speed). The less variable the sequences in the alignment are, the greater the level of compression will be. This is because almost all the compression comes from the ‘constant’ sites that have the same nucleotide for every taxon. Other sites with nucleotide changes are unlikely to share exactly the same pattern with another such that they can be compressed.&lt;/p&gt;

&lt;p&gt;Here is the top 15 most common site patterns for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WNV.fasta&lt;/code&gt; alignment:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;strong&gt;count&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;pattern&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;2821&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2635&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;2025&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;1963&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;60&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANAAAAAAAAAAAAAAA&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;56&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGNGGGGGGGGGGGGGGG&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;49&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCNCCCCCCCCCCCCCCC&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;35&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;30&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;26&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;24&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNTTTTTTTTTTTTTTT&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;21&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANNNNAAAAAAAAAAAAAAA&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;11&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTNNNTTTTTTTTTTTTTTTT&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;10&lt;/td&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCNNNNCCCCCCCCCCCCCCC&lt;/code&gt;&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;The &lt;a href=&quot;wnv_pattern_table.html&quot;&gt;full table can be seen here&lt;/a&gt;. The first observation (as expected) is that the constant sites are the 4 most common patterns by a large margin (they make up 9,444 of the 11,029 sites in the alignment. The second is that most of the other patterns that are at high frequency are ones that are constant except for ambiguities (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;s) in some of the taxa. This is a feature of next generation sequencing where areas of low read coverage are filled in with &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;s. Ambiguities are treated as unknown nucleotides in BEAST and are efficiently dealt with in the likelihood calculations done by &lt;a href=&quot;beagle&quot;&gt;BEAGLE&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This second observation offers an opportunity for further compression of the patterns. If we assume that patterns that are constant except for ambiguities are, in fact, constant then we can just include them in the counts for the respective constant patterns. For example, in the table above the 5th pattern comprises only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;s and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;s so might be considered as a constant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt; pattern. The 10th and 13th patterns are also constant &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;A&lt;/code&gt;s but differ in their pattern of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;N&lt;/code&gt;s so would also be compressed similarly. Doing this for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WNV.fasta&lt;/code&gt; alignment goes from 727 unique site patterns down to 604 ‘ambiguous constant’ site patterns – a reduction of 17%. When run in BEAST with some standard models/settings the time drops from 11.2 minutes to 9.9 minutes for a reduction in run-time of 12%.&lt;/p&gt;

&lt;p&gt;For other data sets, a much larger saving can be achieved. Here are a few examples:&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;strong&gt;Virus&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Dataset&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Sequences&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Sites&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Unique patterns&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Compressed ambiguous&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Factor&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;SARS-CoV-2&lt;/td&gt;
      &lt;td&gt;B.1.1.7&lt;/td&gt;
      &lt;td&gt;976&lt;/td&gt;
      &lt;td&gt;29409&lt;/td&gt;
      &lt;td&gt;2079&lt;/td&gt;
      &lt;td&gt;918&lt;/td&gt;
      &lt;td&gt;2.3x&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;SARS-CoV-2&lt;/td&gt;
      &lt;td&gt;omicron_BA1&lt;/td&gt;
      &lt;td&gt;1000&lt;/td&gt;
      &lt;td&gt;29409&lt;/td&gt;
      &lt;td&gt;1528&lt;/td&gt;
      &lt;td&gt;485&lt;/td&gt;
      &lt;td&gt;3.2x&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Ebolavirus&lt;/td&gt;
      &lt;td&gt;Makona_1610&lt;/td&gt;
      &lt;td&gt;1610&lt;/td&gt;
      &lt;td&gt;18992&lt;/td&gt;
      &lt;td&gt;7926&lt;/td&gt;
      &lt;td&gt;2267&lt;/td&gt;
      &lt;td&gt;3.5x&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;Mpox virus&lt;/td&gt;
      &lt;td&gt;cladei_reservoir&lt;/td&gt;
      &lt;td&gt;60&lt;/td&gt;
      &lt;td&gt;196858&lt;/td&gt;
      &lt;td&gt;3701&lt;/td&gt;
      &lt;td&gt;386&lt;/td&gt;
      &lt;td&gt;9.6x&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;In general, we would expect an increase in likelihood evaluation speed to improve by similar factors (with some allowance for overheads).&lt;/p&gt;

&lt;p&gt;In &lt;a href=&quot;installing&quot;&gt;BEAST X v10.5.0&lt;/a&gt; a new command line option allows this feature to be switched on:&lt;/p&gt;
&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nt&quot;&gt;-pattern_compression&lt;/span&gt; ambiguous_constant
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;It is important to note that this feature is an approximation to the full likelihood calculation. In testing it appears to be a good approximation with only small differences in the estimated values of the parameters. A second option, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-ambiguous_threshold&lt;/code&gt;, specifies the maximum proportion of nucleotides in the pattern that can be ambiguous and still be considered to be constant. This can help reduce the effect of the approximation whilst still allowing considerable savings in run-time. For example, compressing all constant sites irrespective of the proportion of ambiguous nucleotides produces a drop in the log likelihood of the tree (Figure 2, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;all ambiguous constant&lt;/code&gt;). Reducing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-ambiguous_threshold&lt;/code&gt; to 0.25 (which is the default value if not specified) returns a tree likelihood similar to the normal compression approach (Figure 2, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unique patterns&lt;/code&gt;).&lt;/p&gt;

&lt;figure style=&quot; width: 320;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/Patterns_TreeLikelihood.png&quot; alt=&quot;&quot; style=&quot;max-width: 320&quot; /&gt;&lt;figcaption&gt;Figure 2 | Log likelihood of the tree for different compression thresholds. &apos;Uncompressed&apos; is no compression, &apos;unique patterns&apos; has an ambiguity threshold of zero, `all ambiguous constant` has a threshold of 1.0. &lt;/figcaption&gt;&lt;/figure&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;strong&gt;Compression&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Threshold&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Pattern count&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;Run time&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-pattern_compression&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-ambiguous_threshold&lt;/code&gt;&lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
      &lt;th&gt; &lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;off&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;n/a&lt;/td&gt;
      &lt;td&gt;11,029&lt;/td&gt;
      &lt;td&gt; &lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;unique&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;0.0&lt;/td&gt;
      &lt;td&gt;727&lt;/td&gt;
      &lt;td&gt;11.2 minutes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ambiguous_constant&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;0.25&lt;/td&gt;
      &lt;td&gt;659&lt;/td&gt;
      &lt;td&gt;8.1 minutes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ambiguous_constant&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;0.5&lt;/td&gt;
      &lt;td&gt;608&lt;/td&gt;
      &lt;td&gt;7.8 minutes&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ambiguous_constant&lt;/code&gt;&lt;/td&gt;
      &lt;td&gt;1.0&lt;/td&gt;
      &lt;td&gt;604&lt;/td&gt;
      &lt;td&gt;7.3 minutes&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

</description>
            <pubDate>Wed, 17 Jul 2024 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/2024-07-17_pattern_compression.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/2024-07-17_pattern_compression.html</guid>
            
            <category>news</category>
            
            
        </item>
        
        <item>
            <title>BEAST X (v10.5.0-beta5) released</title>
            <description>&lt;h3 id=&quot;we-are-pleased-to-announce-the-release-of-beast-x-v1050-beta5&quot;&gt;We are pleased to announce the release of BEAST X (v10.5.0-beta5)&lt;/h3&gt;

&lt;div class=&quot;bs-callout bs-callout-&quot;&gt;
    &lt;div style=&quot;width: 100%; display: table;&quot;&gt;
        &lt;div style=&quot;display: table-row&quot;&gt;
            &lt;div style=&quot;width: 1%; display: table-cell; text-align: right&quot;&gt;
                
                    &lt;img src=&quot;/images/icons/beastx-icon.png&quot; style=&quot;max-height: 64px; margin: 0px 10px 0px 10px;&quot; /&gt;
                
            &lt;/div&gt;
            &lt;div style=&quot;width: 70%; display: table-cell; vertical-align: middle;&quot;&gt;
                &lt;div style=&quot;font-size: 150%; padding-top: 40px; margin-top: -45px&quot;&gt;&lt;/div&gt;
                &lt;div style=&quot;font-size: 80%; font-weight: normal; font-style: italic;&quot;&gt;&lt;/div&gt;
                &lt;div style=&quot;vertical-align: middle;&quot;&gt;BEAST X is the new name for BEAST v1 project and the first release version of this is &lt;code&gt;v10.5.0&lt;/code&gt; which supersedes &lt;code&gt;v1.10.4&lt;/code&gt; in the old version system. From now on we will use the full major, minor, bugfix style of semantic versioning. Thus, this version is not BEAST 10 but &lt;code&gt;BEAST X v10.5&lt;/code&gt; (the 10th major, 5th minor release of the original BEAST project).&lt;/div&gt;
            &lt;/div&gt;
        &lt;/div&gt;
    &lt;/div&gt;
&lt;/div&gt;

&lt;p&gt;&lt;a href=&quot;installing&quot;&gt;Download BEAST X binaries for Mac, Windows and UNIX/Linux&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot; role=&quot;alert&quot;&gt;&lt;i class=&quot;fa fa-info-circle&quot;&gt;&lt;/i&gt; &lt;b&gt;Note:&lt;/b&gt; This is a beta version which is suitable general use but which may still have issues and bugs.&lt;br /&gt;Please report any to &lt;a href=&quot;https://github.com/beast-dev/beast-mcmc/issues&quot;&gt;the BEAST GitHub issue list&lt;/a&gt;&lt;/div&gt;

</description>
            <pubDate>Tue, 09 Jul 2024 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/2024-07-09_BEAST_X_released.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/2024-07-09_BEAST_X_released.html</guid>
            
            <category>news</category>
            
            
        </item>
        
        <item>
            <title>Developing for BEAST</title>
            <description>&lt;h3 id=&quot;a-less-than-brief-non-comprehensive-introduction-to-beast-development&quot;&gt;A less-than-brief, non-comprehensive, introduction to BEAST development&lt;/h3&gt;
&lt;p&gt;BEAST is scientific software for statistical analyses.
This means there are many things you will eventually need to know.
And this cannot be stressed enough, you don’t need to know all this right now!&lt;/p&gt;

&lt;p&gt;The purpose of this document is to provide the reader a foothold into the many pieces that one needs to know to work with BEAST at a developer level.
It is not a comprehensive introduction to many of those topics, which require treatises in their own right.
But hopefully it will help the reader figure out what to google, or where to look in the code base, to resolve errors.&lt;/p&gt;

&lt;h2 id=&quot;dont-panic&quot;&gt;Don’t panic&lt;/h2&gt;
&lt;p&gt;No one who has contributed to BEAST started out being good at it or knowing how to do all of it.
BEAST is &lt;em&gt;established&lt;/em&gt; software, it’s been used for phylogenetic analyses since the early 2000s.
You’re standing on many shoulders, which is at times really great (and at times really frustrating).
You’re also working on a &lt;em&gt;massive&lt;/em&gt; code base.
Don’t expect to know how everything works, though it’s good for your character to at least have a general idea of how phylogenetic models work.&lt;/p&gt;

&lt;h2 id=&quot;intellij-has-your-back&quot;&gt;IntelliJ has your back&lt;/h2&gt;
&lt;p&gt;Don’t write BEAST java code without &lt;a href=&quot;https://www.jetbrains.com/idea/&quot;&gt;IntelliJ&lt;/a&gt;.
Just don’t.
It will make your life much, much easier, and is worth every minute that it took to get acquainted with.&lt;/p&gt;

&lt;!-- TODO: a longer, better piece about using IntelliJ for BEAST development that expands on this stub --&gt;
&lt;p&gt;Setting up IntelliJ to work with BEAST may take a bit of time (all worth it!).
What follows are some basic tips to help with that.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;You will need to tell IntelliJ how to &lt;a href=&quot;https://www.jetbrains.com/help/idea/adding-build-file-to-project.html&quot;&gt;build BEAST with ant&lt;/a&gt; in order to incorporate your changes into a working version of BEAST that you can run.&lt;/li&gt;
  &lt;li&gt;You can run BEAST through IntelliJ passively or in debug mode. Either way, you need to set up a Run/Debug configuration.
    &lt;ul&gt;
      &lt;li&gt;In “Add New Configuration” select “Application”.&lt;/li&gt;
      &lt;li&gt;The main class (as requested by the space in the configuration) is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;dr.app.beast.BeastMain&lt;/code&gt;.&lt;/li&gt;
      &lt;li&gt;Things that would go in the command line after you call BEAST, like the path to the XML, whether to overwrite, and such, go in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Program arguments&lt;/code&gt; space.&lt;/li&gt;
      &lt;li&gt;You will want to make sure BEAGLE is accessible. One way this can be done is by choosing &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Add VM Options&lt;/code&gt; and in that space specifying &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-Djava.library.path=/usr/local/lib&lt;/code&gt; (or wherever you’ve installed BEAGLE to, if you don’t have root access).&lt;/li&gt;
    &lt;/ul&gt;
  &lt;/li&gt;
  &lt;li&gt;For debugging purposes you will probably want to actually get into the code while it’s bring run with &lt;a href=&quot;https://www.jetbrains.com/help/idea/starting-the-debugger-session.html&quot;&gt;breakpoints&lt;/a&gt;. This should work if you’ve appropriately set up a Run/Debug configuration as above. Then you can put breaks in any line where you want to halt execution and run in debug mode. Note that execution stops &lt;em&gt;before&lt;/em&gt; the line with the break (so you can put one on a return statement line), and that the line has to &lt;em&gt;do something&lt;/em&gt; (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int someInt;&lt;/code&gt; won’t work, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;int someInt = 0;&lt;/code&gt; will).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;make-sure-you-work-on-the-right-branch&quot;&gt;Make sure you work on the right branch&lt;/h2&gt;
&lt;!-- TODO: consider a bigger, better intro to git --&gt;
&lt;p&gt;Working with BEAST, like with any large software project, means working with version control.
For BEAST, that means &lt;a href=&quot;https://en.wikipedia.org/wiki/Git&quot;&gt;git&lt;/a&gt; via &lt;a href=&quot;https://github.com/&quot;&gt;GitHub&lt;/a&gt;.
BEAST lives &lt;a href=&quot;https://github.com/beast-dev/beast-mcmc/&quot;&gt;here&lt;/a&gt;, owned by the &lt;a href=&quot;https://github.com/beast-dev/&quot;&gt;beast-dev&lt;/a&gt; group.&lt;/p&gt;

&lt;p&gt;An introduction to git is out of the scope of this overview.
But do be sure to get access priveleges to BEAST (become part of beast-dev), and be sure to work on the right branch.
Branches are how big projects can work on multiple things simultaneously, and keep more stable versions of a code base side by side with actively-developed versions.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;As a general rule, do not work on the master branch. This is for a more stable, less prone to breaking version of BEAST. Touch this &lt;em&gt;after&lt;/em&gt; you know what you’re doing.&lt;/li&gt;
  &lt;li&gt;You may want to make a new branch to work on a particular feature. Choose the branch to make this new branch from carefully. If you want access to other recent work, it’s best to make it from the branch where that is happening. If you want it to merge into master easily and soon, make it from master.&lt;/li&gt;
  &lt;li&gt;Much ongoing work on HMC sampling is being done on the hmc-clock branch.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;xml-java-and-object-oriented-programming&quot;&gt;XML, java, and object-oriented programming&lt;/h2&gt;
&lt;p&gt;If you’re going to actively develop BEAST, you will eventually need to work with two different languages.&lt;/p&gt;

&lt;p&gt;The core of BEAST is written in java, which is an object-oriented programming (OOP) language.
Much of the computational burden of BEAST is outsourced to BEAGLE, which is written in C++ (another OOP language).
Neither of these may be quite as cool as they once were, but that’s not to say learning either of them is a waste.
Computer language skills are like real language skills, when you learn more, you start to have an easier time picking new ones up.
Which is to say that learning java can help you learn other (possibly more marketable) languages, like python (which is also an OOP language).&lt;/p&gt;

&lt;p&gt;BEAST analyses are specified via XML files.
The typical person using BEAST will generate their XML entirely in BEAUti, much as described in the &lt;a href=&quot;https://beast.community/first_tutorial&quot;&gt;tutorials on the BEAST website&lt;/a&gt;.
Then BEAST will instantiate java objects from that XML and use it to run an analysis.
For such users, the XML file can be more or less an afterthought.
But BEAST is an iceberg. 
Much of its functionality lives below the surface accessible via BEAUti, and certainly all the new work you do will begin this way as well.
That means you should get comfortable manually editing XML files.
A good learning experience to get started on that is to generate XMLs using BEAUti and then look into the XMLs to see how your data and settings are encoded.&lt;/p&gt;

&lt;p&gt;One thing that both java and XMLs have is that they are all about &lt;em&gt;objects&lt;/em&gt;.
Object-oriented programming takes some getting used to, but this means that some lessons from XML-writing port to writing java code, and vice-versa.&lt;/p&gt;

&lt;h2 id=&quot;writing-xmls&quot;&gt;Writing XMLs&lt;/h2&gt;

&lt;p&gt;Let’s get this out of the way first: XML is not really supposed to be human-readable or human-edited.
But, it’s what we’ve got.&lt;/p&gt;

&lt;p&gt;There are some tutorials that explain how to create custom models in XMLs, like &lt;a href=&quot;https://beast.community/custom_substitution_models&quot;&gt;this one&lt;/a&gt; and &lt;a href=&quot;https://beast.community/markov_modulated&quot;&gt;this one&lt;/a&gt; and many of the Advanced Tutorials.
Tutorials like this can be helpful for getting a general sense of what’s going on and how to work with XML, or at least build a bit of intuition that will come in handy when you start running into issues.
But there are a few things worth considering specifically, which are largely linked to the object-centered nature of an XML.&lt;/p&gt;

&lt;p&gt;Note that there’s a reference of the sorts of things you can put in an XML block &lt;a href=&quot;https://beast.community/xml_reference&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3 id=&quot;general-tips&quot;&gt;General tips&lt;/h3&gt;
&lt;p&gt;The following are some general-purpose tips that will serve you well from the moment you see your first XML all the way through your ten thousandth.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;strong&gt;Use a good text editor.&lt;/strong&gt;
Writing XMLs without a good text editor is far more painful than it needs to be.
You’ll be greatly assisted by things like a search/replace tool with good regex (regular expression) functionality, syntax highlighting, auto-closing of XML tags, and more.
You can use IntelliJ for this, or many other editors (for example the perpetually-popular BBEdit and VScode).
Don’t use things like TextEdit, or Notepad+.
Just don’t.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Don’t start from scratch.&lt;/strong&gt;
Very few BEAST XML files get made from scratch.
Most start from pre-existing raw material, and many of those started as a BEAUti-generated XML.
BEAUti may not have everything you want in an analysis, but it can give you a functional starting point, and it is always good to start from a file that works.&lt;/li&gt;
  &lt;li&gt;&lt;strong&gt;Recycle (steal) XML blocks.&lt;/strong&gt;
If you don’t know how to format an XML block for something, find an XML file that has that block.
Copy it over, and modify it until it works for you.
This is significantly faster and easier than trying to figure out how to format it based purely on reading through the parser’s code.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;key-attributes-of-an-xml-object&quot;&gt;Key attributes of an XML object&lt;/h3&gt;
&lt;p&gt;All XML objects open and close with start and end tags, hence the error you may well encounter at some point: &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;The element type &quot;someThing&quot; must be terminated by the matching end-tag &quot;&amp;lt;/someThing&amp;gt;&quot;.&lt;/code&gt;
That often looks like this,&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&amp;lt;someThing&amp;gt;
    code goes here
&amp;lt;/someThing&amp;gt; But for some things that can be declared in one-liners, that looks like,

&amp;lt;someThing thing=&quot;someValue&quot; option=&quot;whatever&quot;/&amp;gt; where the last `/&amp;gt;` is the end-tag.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Many things can go inside these XML blocks, exactly what depends entirely on the class.
If you want to know what a class should have, you can always check the rules in its parser (an easy way to search for that in IntelliJ is to search for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&quot;someThing&quot;&lt;/code&gt;, quotes and all, which should take you to the parser that defines its name on that basis).&lt;/p&gt;

&lt;p&gt;There are two things to be careful about that have burned many a person and cost many hours of time.&lt;/p&gt;

&lt;h4 id=&quot;parsers-only-throw-errors-about-things-that-you-dont-specify&quot;&gt;Parsers only throw errors about things that you &lt;em&gt;don’t&lt;/em&gt; specify.&lt;/h4&gt;
&lt;p&gt;You could throw just about anything into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeDataLikelihood&lt;/code&gt; object and BEAST won’t care.
A totally unrelated likelihood?
Sure.
An MCMC block?
Okay.
It will simply, and quietly, ignore anything it doesn’t know how to use.
So, always check your spelling for arguments and make sure that the parser actually does something with whatever it is you’re trying to feed in.&lt;/p&gt;

&lt;h4 id=&quot;an-unparsed-object-throws-no-straightforward-errors&quot;&gt;An unparsed object throws no (straightforward) errors&lt;/h4&gt;
&lt;p&gt;This is largely a corollary to the above point.
XML objects get parsed by parsers.
Parsers get loaded by being included in the appropriate files (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;development_parsers.properties&lt;/code&gt; being the one for development work, more stable things go elsewhere).
If a parser is not loaded, it is not called on the XML object it &lt;em&gt;would&lt;/em&gt; parse.
That object is then not parsed.
No error is thrown saying “hey I don’t know what to do about this XML block.”
Instead, if you’re lucky, you’ll get warnings caused by the non-existence the object you &lt;em&gt;thought&lt;/em&gt; you had created.&lt;/p&gt;

&lt;p&gt;So when making new parsers for new classes (as discussed more below), be very careful to get that parser into a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;development_parsers.properties&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;key-attributes-of-a-beast-xml&quot;&gt;Key attributes of a BEAST XML&lt;/h3&gt;
&lt;p&gt;A BEAST XML is a definition of a statistical model.
We are defining the &lt;em&gt;parameters&lt;/em&gt; of the model and how they go together.
We must define the prior and the likelihood and the joint (posterior) target distribution.
We must also specify how parameters are to be sampled in MCMC (via the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;operators&amp;gt;&lt;/code&gt; block).
There are varying consequences for failing to specify all these things which are potentially disastrous if un-caught.
By way of example:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;A parameter can go unsampled if an operator on it is never specified.&lt;/li&gt;
  &lt;li&gt;The wrong joint posterior distribution may be targeted if a parameter’s prior is not included in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;prior&amp;gt;&lt;/code&gt; block or its effect on the likelihood is not accounted for in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;likelihood&amp;gt;&lt;/code&gt; block.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The resulting posterior will be incorrect and possibly improper.
Those are bad, so be careful!&lt;/p&gt;

&lt;p&gt;The classic (somewhat weirdly-ordered) workflow of a BEAST XML is more or less:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Define a parameter at its first use.&lt;/li&gt;
  &lt;li&gt;Put an operator on it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;operators&amp;gt;&lt;/code&gt; block.&lt;/li&gt;
  &lt;li&gt;In the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;MCMC&amp;gt;&lt;/code&gt; block, put a prior on it in the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;priors&amp;gt;&lt;/code&gt; (sub)block.&lt;/li&gt;
  &lt;li&gt;Put it in some log file in a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;log&amp;gt;&lt;/code&gt; or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;logTree&amp;gt;&lt;/code&gt; (sub)block (of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;MCMC&amp;gt;&lt;/code&gt; block).&lt;/li&gt;
&lt;/ol&gt;

&lt;h3 id=&quot;there-will-be-bugs&quot;&gt;There will be bugs&lt;/h3&gt;
&lt;p&gt;XML editing is bound to result in errors.
Just like for any other coding job, you will find yourself spending time fixing these bugs.
(As well as time putting them in unintentionally, of course.)
BEAST does try to help you out with this.
BEAST will write error messages and those can refer to lines or specific XML objects where problems were encountered.
Heed those warnings, and start your debugging search there.&lt;/p&gt;

&lt;p&gt;When you’re writing your own code for BEAST, you can also gain information from the error messages.
BEAST can tell you what line in what java class ended up provoking a fatal error, and the call stack that led there.
This is very helpful.
And even when writing your own classes, XML bugs remain a leading source of issues when things go wrong.
Never take your XML for granted.&lt;/p&gt;

&lt;h2 id=&quot;designing-and-writing-beast-code&quot;&gt;Designing and writing BEAST code&lt;/h2&gt;
&lt;p&gt;Because of the java-XML dichotomy, to get something made in BEAST that is usable, you will generally need to write both a class (to do what you want) and a parser.&lt;/p&gt;

&lt;h3 id=&quot;parsers&quot;&gt;Parsers&lt;/h3&gt;
&lt;p&gt;A parser in BEAST is a function that parses a specific kind of XML object, like a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;treeDataLikelihood&amp;gt;&lt;/code&gt; or a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;newick&amp;gt;&lt;/code&gt;.
When designing a parser, important questions to consider include:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;What, if any, fixed options will a user need to specify or set?&lt;/li&gt;
  &lt;li&gt;More importantly, what other parts of a BEAST model will it need access to?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Marc Suchard School of BEAST Development says that you should always start by thinking about what you want your parser to look like.
There are some very good arguments in favor:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;You won’t get very far writing anything if your hypothetical user (who is you in a few hours or weeks) can’t specify a way to access the code you’re going to write.&lt;/li&gt;
  &lt;li&gt;This is one less thing to debug later, helping separate out the many layers where issues can otherwise occur.&lt;/li&gt;
  &lt;li&gt;If you have a working parser, you can start using an example XML and get inside the code with IntelliJ to trace and squash bugs earlier.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On the other hand, knowing what you need to pass in requires knowing a lot about what you can pass in and how things work. You may not know what is accessible for your needs.
Until you’re sure what works best for you, it may be a good idea to not let either the parser or the actual class at hand get too far ahead of the other.
Minimally, don’t assume you’re done with developing your parser just because it can take in all the objects you think it needs.
The same applies the other way: don’t assume your java class is done just because IntelliJ isn’t throwing any more errors.
You may go through a few rounds of developing both of these before you know exactly what you need in them.&lt;/p&gt;

&lt;h3 id=&quot;classes&quot;&gt;Classes&lt;/h3&gt;
&lt;p&gt;To actually do something new in BEAST, you’ll probably be writing a java class.
Eventually you might need Interfaces, which are entirely abstract classes that can’t be used, but define a lot about how a class works.
A java class defines an object that will be made at some point when BEAST is called on the XML.
That object can be just about anything and do just about anything, BEAST has classes for parameters, trees, parameter transformations, multiple sequence alignments, the classical phylogenetic likelihood, HMC operators, and much, much more.&lt;/p&gt;

&lt;p&gt;Classes are so general that the advice here is going to have to be pretty general.
The basic structure of a class is that it gets defined, then you define a constructor that sets up a new object, and then you write any code you need so the object can do what it needs to do.
IntelliJ is your friend because it is good at knowing what sorts of functions your class will need.&lt;/p&gt;

&lt;p&gt;The &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extends&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implements&lt;/code&gt; keywords are ways to relate your class to other classes.
A class can &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extend&lt;/code&gt; another existing regular class, while it can only &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implement&lt;/code&gt; an abstract class.
Both of these are useful tools for outsourcing a lot of work to stuff others have written.
In cases like this, you may start to see &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;super&lt;/code&gt; get tossed around, which is a way to tell java to let this class’s superclass (parent class) handle something.
Often in BEAST you’ll see this for constructors.&lt;/p&gt;

&lt;p&gt;Classes can (and will usually) have member objects.
Member objects can be important parts of a class, like the many things that go into a tree model.
Or they can be convenient ways to offload actually doing work into something else.
This is why many classes have some form of likelihood object as a member, it’s much easier (and saves a lot of duplicated code) to say “hey likelihood, what are you now?” than to write out the likelihood again.&lt;/p&gt;

&lt;h4 id=&quot;case-study-hky&quot;&gt;Case study: HKY&lt;/h4&gt;
&lt;p&gt;As a brief case study, let’s take a very fast look at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HKY.java&lt;/code&gt; (specifically the version on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;hmc-clock&lt;/code&gt; branch).
This &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extends&lt;/code&gt; the class &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BaseSubstitutionModel&lt;/code&gt; while it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implements&lt;/code&gt; several classes, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Citable&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ParameterReplaceableSubstitutionModel&lt;/code&gt;, and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;DifferentiableSubstitutionModel&lt;/code&gt;.
Thankfully, most of these do what they’re named.
One thing we might ask ourselves is whether we really need to know what all of these are.
The answer in many cases will be no, which is quite convenient.&lt;/p&gt;

&lt;p&gt;The class has only one member variable, kappa.
This seems suspicious, where are the stationary frequencies?
Let’s check the constructor.
“Which constructor?” you might ask, as there are two.
That’s easy enough to see here, though.
The first takes in kappa as a double (a fixed value), makes a parameter out of it, and calls the other one.
So the second one is the “real” constructor, and it takes in a kappa parameter and a frequency model.
The real constructor then calls the super class’s constructor.
If we glance at &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BaseSubstitutionModel&lt;/code&gt;, we see that &lt;em&gt;it&lt;/em&gt; holds onto the stationary frequencies (the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;frequencyModel&lt;/code&gt;).
Mystery solved!&lt;/p&gt;

&lt;p&gt;Most of the rest of the class does 4 things:&lt;/p&gt;

&lt;ol&gt;
  &lt;li&gt;Handles the basics of HKY. Computes things about rate matrices, makes sure things get updated when kappa changes, and allows tracking the transition-transversion rate ratio if you really want to.&lt;/li&gt;
  &lt;li&gt;Tells BEAST how to cite the model, living up to its promise that it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implements Citable&lt;/code&gt;.&lt;/li&gt;
  &lt;li&gt;A function called &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;public HKY factory(List&amp;lt;Parameter&amp;gt; oldParameters, List&amp;lt;Parameter&amp;gt; newParameters)&lt;/code&gt; which, if you poke into it, will turn out to be living up to its promise that it &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implements ParameterReplaceableSubstitutionModel&lt;/code&gt;. That is a class that can be used to have branch-specific model parameters.&lt;/li&gt;
  &lt;li&gt;A lot of functions involved in gradient computation, which only matter when you need to know about that.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There is also a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;public static void main(String[] args)&lt;/code&gt; function that serves as a test.&lt;/p&gt;

&lt;p&gt;And this class also highlights the existence of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;enum&lt;/code&gt;, which is sort of a low-level class that you can declare and use inside a class to handle different cases for something without writing tons of if/else statements.&lt;/p&gt;

&lt;h3 id=&quot;beware-the-model-graph&quot;&gt;Beware the model graph&lt;/h3&gt;
&lt;p&gt;BEAST assembles the components of the statistical model from the XML into a graph.
This allows various parts of the model to know when something has changed, so they know if something should be recomputed.
For example, if you change the stationary frequencies of a GTR model, you expect that BEAST will know that:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;The prior density has changed.&lt;/li&gt;
  &lt;li&gt;A rate matrix has changed, which means the phylogenetic likelihood needs to be recomputed, which means the likelihood has changed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are a number of components of BEAST classes that have to do with this.
These exist in things that are &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Model&lt;/code&gt;s or &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Parameter&lt;/code&gt;s and classes that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implement&lt;/code&gt; things like &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ModelListener&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;VariableListener&lt;/code&gt;.
Through the &lt;del&gt;magic&lt;/del&gt;dark voodoo of good software design, you will often be able to do things without going anywhere near any of this.
But sometimes you’ll get a dreaded error, like a likelihood not being the same after a move is rejected, and you’ll have to pay attention.
The following is a non-exhaustive list of functions to pay attention to.
They may be implemented wrong, unimplemented, or unimportant:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fireModelChanged&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;handleModelChangedEvent&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fireVariableChanged&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;handleVariableChangedEvent&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;storeState&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;restoreState&lt;/code&gt;&lt;/li&gt;
  &lt;li&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;acceptState&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you’re reading this because you need it, good luck and godspeed.&lt;/p&gt;

&lt;h3 id=&quot;write-tests&quot;&gt;Write tests!&lt;/h3&gt;
&lt;p&gt;When you write something new, you should think really strongly about writing a test to make sure it works.
Maybe a few tests.
It doesn’t matter how braindead simple you think your code is.
Tests are awesome, and tests are your friend.&lt;/p&gt;

&lt;p&gt;At its most basic level, a test executes code and checks the value against a hard-coded (prespecified) value.
Then it either passes and says everything worked, or complains if the values are different (or too different, sometimes you need to allow for some variation).
But tests can serve a number of purposes when you’re developing new code.&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;They are debugging tools, to help you find and squash issues. You can write aspirational tests to make sure the code will handle everything you want it to.&lt;/li&gt;
  &lt;li&gt;A test file can serve as a repository of edge-cases, since edge cases often provoke heretofore unseen issues. When something new and weird happens, add it to the pile.&lt;/li&gt;
  &lt;li&gt;Tests prevent future muck-ups. If someone (possibly you in a few weeks or years) alters code and that in turn changes how your code executes, your test will complain about that. This will alert the author of the changes that those changes are having unexpected consequences elsewhere, so they need to make some tweaks.&lt;/li&gt;
  &lt;li&gt;Tests keep &lt;em&gt;you&lt;/em&gt; safe from the worry of messing up existing functionality. (This is just the last point seen from the perspective of the person making changes.) If you’re going to make changes to code, and it isn’t being tested, add tests to check that you don’t break any existing functionality.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In BEAST, we have two kinds of tests, XML tests and java tests.&lt;/p&gt;

&lt;h4 id=&quot;tests-in-xml&quot;&gt;Tests in XML&lt;/h4&gt;
&lt;p&gt;XML tests live in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beast-mcmc/ci/&lt;/code&gt;, mostly in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beast-mcmc/ci/TestXML/&lt;/code&gt; (some that need to load information live in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beast-mcmc/ci/TestXMLwithLoadState/&lt;/code&gt;).
These tests are all, as you might expect, written as XMLs.
You set up XML objects, then you check that they produce the expected result.
This can test many things, from simple distributions, to complicated likelihoods, to MCMC (the latter can be done without loading with judicious usage of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;fireParameterChanged&lt;/code&gt; XML blocks).
These are the easiest place to make tests, and they have the additional benefit of making sure things get parsed acceptably too.&lt;/p&gt;

&lt;p&gt;Many these tests hinge on the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;&amp;lt;report&amp;gt;&lt;/code&gt; XML block, which calls the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;getReport()&lt;/code&gt; method of a class that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;implements Reportable&lt;/code&gt;.
Reports are a great way for you to see what a class you’re writing is doing, and can do and print basically anything to screen that you might want.
When you want to use a report to check against a prespecified value, you can use regex to parse the report and extract what you want.
This will all make more sense when you look at examples and try to start making it work.&lt;/p&gt;

&lt;p&gt;You can run any of these tests by calling BEAST on the XML.&lt;/p&gt;

&lt;p&gt;Note that “ci” stands for Continuous Integration because these tests are run as part of &lt;a href=&quot;https://docs.github.com/en/actions/automating-builds-and-tests/about-continuous-integration&quot;&gt;GitHub’s continuous integration&lt;/a&gt;.
The upshot is that whenever anything is pushed to the BEAST repository on GitHub, the tests are run (this is specified &lt;a href=&quot;https://github.com/beast-dev/beast-mcmc/blob/master/.github/workflows/ci.yml&quot;&gt;here&lt;/a&gt;).
This means that if you break something you’ll find out pretty quickly.
That’s good!
As bad as an “oops you broke BEAST” message feels, it’s much better to know now than after that has caused analyses to be wrong.&lt;/p&gt;

&lt;h4 id=&quot;tests-in-java&quot;&gt;Tests in java&lt;/h4&gt;
&lt;p&gt;Sometimes, you can’t quite fit what you want or need to test into an XML.
This can happen when you’re working with code that lives deeper in the code base.
But even some things that you use in XMLs are hard to test that way (a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;transform&lt;/code&gt; block doesn’t know what variables go in it, making it hard to test directly in XML).
In these cases, you may need the more powerful and flexible internal unit testing done entirely in java.
These tests live in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beast-mcmc/src/test.dr/&lt;/code&gt; and work by creating java objects directly, then checking that functions operating on those objects produce the expected results.&lt;/p&gt;

&lt;p&gt;You can run these unit tests directly in IntelliJ, which will tell you which (if any) of the tests in a unit test file pass and fail.&lt;/p&gt;

&lt;p&gt;These tests are executed when BEAST is built by ant (as noted in &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beast-mcmc/build.xml&lt;/code&gt;).
So if your changes to some code break something else that has a unit test, you’ll know as soon as you try to compile the code to run it.&lt;/p&gt;

&lt;p&gt;Note that the file structure of where these tests live mimics &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;beast-mcmc/src/dr&lt;/code&gt;.&lt;/p&gt;

&lt;h3 id=&quot;a-brief-overview-of-some-guiding-principles&quot;&gt;A brief overview of some guiding principles&lt;/h3&gt;
&lt;p&gt;There are many tips for good software design, here are a few.&lt;/p&gt;

&lt;h4 id=&quot;dont-reinvent-the-wheel&quot;&gt;Don’t reinvent the wheel!&lt;/h4&gt;
&lt;p&gt;BEAST is old enough that many things you might want to do have been done.
Before going out and writing something from scratch, see if it exists first!&lt;/p&gt;

&lt;h4 id=&quot;recycle-dont-rewrite&quot;&gt;Recycle, don’t rewrite&lt;/h4&gt;
&lt;p&gt;Many times you will find what you want to do is a small modification of something else.
You could add your code directly to the existing class.
But if you instead make a new class that &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;extends&lt;/code&gt; the pre-existing class, you may have an easier time.
And you certainly won’t have to worry about messing up the existing functionality of that class.&lt;/p&gt;

&lt;h4 id=&quot;kiss&quot;&gt;KISS&lt;/h4&gt;
&lt;p&gt;KISS is a great &lt;a href=&quot;https://nl.wikipedia.org/wiki/Kiss_(band)&quot;&gt;band&lt;/a&gt; but it also stands for &lt;a href=&quot;https://en.wikipedia.org/wiki/KISS_principle&quot;&gt;keep it simple, stupid&lt;/a&gt;.
Don’t make things more complex than they need to be.&lt;/p&gt;

&lt;h4 id=&quot;keep-an-eye-on-generality&quot;&gt;Keep an eye on generality&lt;/h4&gt;
&lt;p&gt;Sometimes it’s easier not to solve just the problem at hand but a general class of problems.
Sometimes you know you’ve got an extension coming down the line.
Sometimes you just want to future-proof your work.
For any and all of these reasons, it’s good to do things generally when possible (but &lt;em&gt;done&lt;/em&gt; and working code is better than hypothetical code).&lt;/p&gt;

&lt;h4 id=&quot;test-early-and-test-often&quot;&gt;Test early and test often&lt;/h4&gt;
&lt;p&gt;Tests are a developer’s best friend.
Write them.
Use them.
Love them.&lt;/p&gt;

</description>
            <pubDate>Tue, 08 Aug 2023 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/beast_development_introduction.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/beast_development_introduction.html</guid>
            
            <category>article</category>
            
            
        </item>
        
        <item>
            <title>Ebola Virus Local Clock Analysis</title>
            <description>
&lt;p&gt;In 2014 there was an outbreak of Ebola virus disease in the Democratic Republic of Congo. 
When the first genome sequences were published (Maganga &lt;em&gt;et al.&lt;/em&gt; 2014) it was noticed that the amount of divergence from the earliest EBOV genomes from the 1970s was considerably less than for the West African epidemic genomes which were from the same year. 
This suggested that the DRC lineage had exhibited a substantially lower rate of evolution (Lam &lt;em&gt;et al.&lt;/em&gt; 2015). 
Lam &lt;em&gt;et al.&lt;/em&gt; speculated that this may be due to it being in a different host species with different evolutionary forces at work.&lt;/p&gt;

&lt;p&gt;However, during the West African outbreak, a number of examples of long-term latency were observed where someone who had recovered from EVD months later transmitted the virus to another individual – usually a sexual partner (Blackley &lt;em&gt;et al.&lt;/em&gt; 2016). 
In the most extreme example, there was a 15 month interval between the acute infection and the onward transmission (Diallo &lt;em&gt;et al.&lt;/em&gt; 2016). 
It was noticed that these cases were often associated with a short branch length suggesting a reduced rate of evolution or a form of latency with reduced replication for much of the period. 
This suggests that a similar process could be at work for EBOV in the non-human animal hosts over longer timescales.&lt;/p&gt;

&lt;p&gt;Sequences from the last 3 DRC outbreaks (in 2017, summer 2018 and the currently ongoing one in the North East of the country) also exhibit this apparently reduced branch length. 
&lt;a href=&quot;http://virological.org/t/drc-2018-viral-genome-characterization/230&quot;&gt;See this post for a tree produced by the INRB and USAMRIID that shows this effect&lt;/a&gt; and also &lt;a href=&quot;https://doi.org/10.1016/S1473-3099(19)30118-5&quot;&gt;Mbala-Kingebeni &lt;em&gt;et al.&lt;/em&gt; 2019b&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To explore EBOV rate varation in non-human hosts, we assembled a data set of genomes that spans the known history of the virus. Most EBOV genomes have been sampled from human cases so we have included one genome per outbreak, preferring those with precise dates of sampling. 
A list of sequences used is given in Table 1 along with their Genbank accession numbers and, where available, a reference to the published work describing them.&lt;/p&gt;

&lt;div class=&quot;small-text&quot;&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;&lt;strong&gt;accession&lt;/strong&gt;&lt;/th&gt;
        &lt;th&gt;&lt;strong&gt;country&lt;/strong&gt;&lt;/th&gt;
        &lt;th&gt;&lt;strong&gt;name&lt;/strong&gt;&lt;/th&gt;
        &lt;th&gt;&lt;strong&gt;date&lt;/strong&gt;&lt;/th&gt;
        &lt;th&gt;&lt;strong&gt;outbreak&lt;/strong&gt;&lt;/th&gt;
        &lt;th&gt;&lt;strong&gt;reference&lt;/strong&gt;&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KR063671&quot;&gt;KR063671&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Yambuku-Mayinga&lt;/td&gt;
        &lt;td&gt;1976-10-01&lt;/td&gt;
        &lt;td&gt;Yambuku/1976&lt;/td&gt;
        &lt;td&gt; &lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KC242791&quot;&gt;KC242791&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Bonduni&lt;/td&gt;
        &lt;td&gt;1977-06&lt;/td&gt;
        &lt;td&gt;Bonduni/1977&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pubmed/23255795&quot;&gt;Carroll et al. 2013&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KC242792&quot;&gt;KC242792&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;GAB&lt;/td&gt;
        &lt;td&gt;Gabon&lt;/td&gt;
        &lt;td&gt;1994-12-27&lt;/td&gt;
        &lt;td&gt;Minkebe/1994&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pubmed/23255795&quot;&gt;Carroll et al. 2013&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KU182905&quot;&gt;KU182905&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Kikwit-9510621&lt;/td&gt;
        &lt;td&gt;1995-05-04&lt;/td&gt;
        &lt;td&gt;Kikwit/1995&lt;/td&gt;
        &lt;td&gt; &lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KC242793&quot;&gt;KC242793&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;GAB&lt;/td&gt;
        &lt;td&gt;1Eko&lt;/td&gt;
        &lt;td&gt;1996-02&lt;/td&gt;
        &lt;td&gt;Mayibout/1996&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pubmed/23255795&quot;&gt;Carroll et al. 2013&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KC242798&quot;&gt;KC242798&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;GAB&lt;/td&gt;
        &lt;td&gt;1Ikot&lt;/td&gt;
        &lt;td&gt;1996-10-27&lt;/td&gt;
        &lt;td&gt;Booue/1996&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pubmed/23255795&quot;&gt;Carroll et al. 2013&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KC242800&quot;&gt;KC242800&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;GAB&lt;/td&gt;
        &lt;td&gt;Ilembe&lt;/td&gt;
        &lt;td&gt;2002-02-23&lt;/td&gt;
        &lt;td&gt;Mekambo/2001&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pubmed/23255795&quot;&gt;Carroll et al. 2013&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KF113529&quot;&gt;KF113529&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;COG&lt;/td&gt;
        &lt;td&gt;Kelle_2&lt;/td&gt;
        &lt;td&gt;2003-10&lt;/td&gt;
        &lt;td&gt;Mbomo/2003&lt;/td&gt;
        &lt;td&gt;Chiu et al. 2013&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/HQ613403&quot;&gt;HQ613403&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;M-M&lt;/td&gt;
        &lt;td&gt;2007-08-31&lt;/td&gt;
        &lt;td&gt;Luebo/2007&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;http://doi.org/10.1093/infdis/jir364&quot;&gt;Grard et al. 2011&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/HQ613402&quot;&gt;HQ613402&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;034-KS&lt;/td&gt;
        &lt;td&gt;2008-12-31&lt;/td&gt;
        &lt;td&gt;Luebo/2008&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;http://doi.org/10.1093/infdis/jir364&quot;&gt;Grard et al. 2011&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KJ660347&quot;&gt;KJ660347&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;GIN&lt;/td&gt;
        &lt;td&gt;Makona-Gueckedou-C07&lt;/td&gt;
        &lt;td&gt;2014-03-20&lt;/td&gt;
        &lt;td&gt;West_Africa/2013&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/pubmed/24738640&quot;&gt;Baize et al. 2014&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/KP271018&quot;&gt;KP271018&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Lomela-Lokolia16&lt;/td&gt;
        &lt;td&gt;2014-08-20&lt;/td&gt;
        &lt;td&gt;Boende-Lokolia/2014&lt;/td&gt;
        &lt;td&gt;Naccache et al. 2014&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/MH613311&quot;&gt;MH613311&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Muembe.1&lt;/td&gt;
        &lt;td&gt;2017-05-07&lt;/td&gt;
        &lt;td&gt;Likati/2017&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://doi.org/10.1093/infdis/jiz107&quot;&gt;Nsio et al. 2018&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/MH733477&quot;&gt;MH733477&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Tumba-BIK009&lt;/td&gt;
        &lt;td&gt;2018-05-10&lt;/td&gt;
        &lt;td&gt;Bikoro-Mbandaka/2018&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://doi.org/10.1016/S1473-3099(19)30124-0&quot;&gt;Mbala-Kingebeni et al. 2019a&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;&lt;a href=&quot;https://www.ncbi.nlm.nih.gov/nuccore/MK007330&quot;&gt;MK007330&lt;/a&gt;&lt;/td&gt;
        &lt;td&gt;DRC&lt;/td&gt;
        &lt;td&gt;Ituri-18FHV090&lt;/td&gt;
        &lt;td&gt;2018-07-28&lt;/td&gt;
        &lt;td&gt;Kivu/2018&lt;/td&gt;
        &lt;td&gt;&lt;a href=&quot;https://doi.org/10.1016/S1473-3099(19)30118-5&quot;&gt;Mbala-Kingebeni et al. 2019b&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;

&lt;/div&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 1&lt;/strong&gt; A list of the genomes used in this post and their references.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Building a maximum likelihood tree of these genomes shows the apparent slow down in the recent lineages (Figure 1; yellow dots). 
A root-to-tip regression (the line is fitted only to the green dots) shows how far below the expected line these are (this is similar to Figure 5 in &lt;a href=&quot;https://doi.org/10.1016/S1473-3099(19)30118-5&quot;&gt;Mbala-Kingebeni et al. 2019b&lt;/a&gt;).&lt;/p&gt;

&lt;iframe src=&quot;https://rambaut.github.io/figtree.js/ebov.html&quot; style=&quot;width: 1500px; height: 450px; border: 0px&quot;&gt;&lt;/iframe&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 1.&lt;/strong&gt; A tree and root-to-tip plot for the 15 Ebola virus genomes in Table 1.
This is an interactive figure: click the points to include/exclude them from the regression. 
The yellow tips are not included in the regression.
Click on a branch of the tree to re-root the tree at that position.
&lt;a href=&quot;https://github.com/rambaut/figtree.js/&quot;&gt;The source code for this figure is available here&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;methods&quot;&gt;Methods&lt;/h2&gt;

&lt;p&gt;To characterise this effect, we have used relaxed-clock models in BEAST to allow different rates of evolution for different parts of the tree.&lt;/p&gt;

&lt;figure style=&quot; width: 320;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_iqtree_highlighted.png&quot; alt=&quot;&quot; style=&quot;max-width: 320&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 2.&lt;/strong&gt; A maximum likelihood tree of the 15 EBOV genomes with the ‘slow’ clades highlighted.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Two lineages are identified as having lower than expected divergence by &lt;a href=&quot;https://doi.org/10.1016/S1473-3099(19)30118-5&quot;&gt;Mbala-Kingebeni et al. 2019b&lt;/a&gt; (see Figure 2):&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;One comprising the outbreak in 2017 in Likati and the on-going 2018-2019 outbreak in North Kivu Province – represented by the genomes &lt;em&gt;Muembe.1&lt;/em&gt; and &lt;em&gt;Ituri-18FHV090&lt;/em&gt;.&lt;/li&gt;
  &lt;li&gt;The other comprising the outbreak in 2014 in Lokolia and the 2018 outbreak in Équateur Province – represented by the genomes &lt;em&gt;Lomela-Lokolia16&lt;/em&gt; and &lt;em&gt;Tumba-BIK009&lt;/em&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Firstly we used the Local Clock model which allows us to specify which parts of the tree have different rates (although this doesn’t specify which bits are fast and which are slow).
This allows us to assign a different rate of evolution to the two lineages described above (including the ‘stem’ branch leading to each clade).&lt;/p&gt;

&lt;p&gt;This model was used in &lt;a href=&quot;https://doi.org/10.1016/S1473-3099(19)30118-5&quot;&gt;Mbala-Kingebeni et al. (2019b)&lt;/a&gt; where these two lineages are labelled &lt;em&gt;clade a&lt;/em&gt; (“EBOV/Tum” &amp;amp; “EBOV/Lom”) and &lt;em&gt;clade c&lt;/em&gt; (“EBOV/Muy” &amp;amp; “EBOV/Itu”), respectively (Figure S8 of the Supplementary information). 
This paper shows that both these clades have a lower rate of evolution over all (Figure S8B).&lt;/p&gt;

&lt;p&gt;As a comparison we also ran the analysis with a strict molecular clock (which assumes a single rate over the whole tree) and a log-normal uncorrelated relaxed clock (which allows each branch to have a different rate, independently drawn from a log-normal distribution). 
We also ran a strict molecular clock but excluding the recent DRC outbreak genomes.&lt;/p&gt;

&lt;p&gt;For all of these analyses we constrained the tree topology so all of the viruses after the 1970s were monophyletic to maintain a consistent rooting. 
This was the rooting suggested by a much earlier an analysis (Dudas and Rambaut 2014).&lt;/p&gt;

&lt;p&gt;Analysis was done by partitioning the genomes into 1st, 2nd &amp;amp; 3rd codon positions for the concatenated protein coding regions and a 4th partition comprising the concatenated intergenic regions. 
Each partition was given an HKY model with gamma distributed site-specific rates and parameters for each were unlinked.&lt;/p&gt;

&lt;div class=&quot;alert alert-success&quot; role=&quot;alert&quot;&gt;&lt;i class=&quot;fa fa-download fa-lg&quot;&gt;&lt;/i&gt; &lt;a href=&quot;/files/EBOV_local_clock_XMLs.zip&quot;&gt;XML files for all the analyses are available here.&lt;/a&gt;. 
&lt;/div&gt;

&lt;h2 id=&quot;results&quot;&gt;Results&lt;/h2&gt;

&lt;p&gt;For the local clock model (Figure 3), you can see the two lineages that have been allowed a different rate and both have a slower rate than the rest of the tree (i.e. the branches are coloured by rate with blue meaning lower than average). 
This and all the subsequent trees are drawn on the same timescale to allow a comparison of the depth of the trees.&lt;/p&gt;

&lt;figure style=&quot; width: 100%;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_LC1.MCC.tree.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 3.&lt;/strong&gt; A local clock tree where two lineages, identified &lt;em&gt;a priori&lt;/em&gt;, are allowed to evolve at a different rate. The branches are coloured by rate with blue meaning lower than average, red higher. Green bars represent 95% credible intervals for the date of the node.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Secondly, for the relaxed clock tree (Figure 4), you can see that there is variation in rate across the tree (the clades of interest do, however, have the slowest rates). 
Note however that the age of the root of the tree is further back in time and the HPD bar spans nearly 4 decades. 
Essentially the lognormal distribution is struggling to adequately describe the variation in rates given the extreme outliers seen in Figure 3.&lt;/p&gt;

&lt;figure style=&quot; width: 100%;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_UCLN.MCC.tree.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 4.&lt;/strong&gt; The uncorrelated lognormal (UCLN) relaxed clock model. The rate for each branch is inferred independently with no &lt;em&gt;a priori&lt;/em&gt; structure imposed.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Finally, as a comparison, here is the strict molecular clock tree with the same rate over the whole tree (Figure 5). 
Once again, the root of the tree is much older than the local clock model and the relative branch lengths are very different.&lt;/p&gt;

&lt;figure style=&quot; width: 100%;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_SC.MCC.tree.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 5.&lt;/strong&gt; The strict molecular clock model with a single rate describing the whole tree.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;refining-the-model&quot;&gt;Refining the model&lt;/h2&gt;

&lt;p&gt;Looking at the relaxed clock tree in Figure 4, we notice that for the two clades of interest, the tip branches seem to have a higher rate than the stem branches (they are less blue and they are shorter than in the local clock model). 
This suggests another possibility — that it is not the whole clade that has a lower rate of evolution but just the branch leading to the common ancestor of the pair. 
This makes more sense if this is being produced by a process of latency (i.e., a switch between active replication and no replication). 
This would mean that, parsimoniously, there were just these two branches where the virus was latent for some period of time.
We would assume that an internal node in the tree represents active replication and epidemiological spread and thus the virus being in the non-latent state. 
Thus it is unlikely that a whole clade and stem exhibits latency (unless the propensity to latency increased on the stem lineage).&lt;/p&gt;

&lt;figure style=&quot; width: 320;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_iqtree_highlighted_stem.png&quot; alt=&quot;&quot; style=&quot;max-width: 320&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 6.&lt;/strong&gt; The two stem branches given a different rate of evolution in the refined local clock model (the rest of the tree is assumed to have the same rate including the tip branches of the two clades identified in Figure 2.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To examine this we can set up a new local clock model which just has the internal stem branch given the different rate of evolution with the tip branches in this clade having the same rate as the rest of the tree (Figure 6).&lt;/p&gt;

&lt;figure style=&quot; width: 100%;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_LC2.MCC.tree.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 7.&lt;/strong&gt; Stem branch only local clock model. Only the stem branches above the two clades of interest are allowed different rates of evolution.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In comparison with the clade-specific local clock (Figure 3), the most recent common ancestors of the Muembe.1/18FHV090 pair and Lokolia/Bikoro pair are much more recent. Other than that, the trees are very similar. The rates of evolution on the two stem lineages are even slower (more blue). We compare the actual values of these rates in Figure 9.&lt;/p&gt;

&lt;p&gt;Looking at the average rate of evolution over the whole tree (Figure 8) shows the slow-down in the in the two lineages affects the strict clock to a much greater degree than the relaxed clocks.&lt;/p&gt;

&lt;figure style=&quot; width: 450px;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/Mean_rate_LC1_LC2_SC_UCLN.png&quot; alt=&quot;&quot; style=&quot;max-width: 450px&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 8.&lt;/strong&gt; Box-and-whisker plot of the mean rates of evolution across all four models.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;But if we look at the local clock models and compare the rates for the Likati/North Kivu and Lokolia/Équateur clades and the respective stem branches (Figure 9), we see the slow rates (the stem-only model gives an even slower rate for this one branch - supporting the idea that this is the branch that experienced some ‘latency’). Interestingly the rates for the two stem branches are even lower than the clades and very similar (whereas the rates for the clades are different because they include a mixture of fast and slow branches for different amounts of time).&lt;/p&gt;

&lt;figure style=&quot; width: 450px;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/Local_rates_LC1_LC2.png&quot; alt=&quot;&quot; style=&quot;max-width: 450px&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 9.&lt;/strong&gt; Box-and-whisker plot of the estimated rates for the two local clock model variants. The rate labelled ‘Tree’ is the rate for the rest of the tree (excluding the local clocks), then the rates for the Likati/North Kivu and Lokolia/Équateur lineages when the whole clade is included and then only the stem lineages.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Finally we ran the strict clock model on a data set where we omitted the four most recent DRC genomes that are involved in the apperent slow down in rates (the last 4 sequences in Table 1). We compared this rate with the two local clock models for the rate of evolution estimated for the parts of the tree that are not included in the local clocks (the red branches in Figures 3 and 7).&lt;/p&gt;

&lt;figure style=&quot; width: 450px;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/Primary_rate_LC1_LC2_SC_SC11.png&quot; alt=&quot;&quot; style=&quot;max-width: 450px&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 10.&lt;/strong&gt; Box-and-whisker plot of the estimated rate for the tree (excluding the local clock rates for these models) in comparison to the strict clock rate and the rate for a strict clock on a data set that excludes the 4 recent DRC genomes (i.e., excluding the lineages that are exhibiting slow downs).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2 id=&quot;model-selection&quot;&gt;Model selection&lt;/h2&gt;

&lt;p&gt;BEAST implements a number of related approaches for comparing competing models (&lt;a href=&quot;/model_selection_1&quot;&gt;see here for some detailed instructions on applying these&lt;/a&gt;). These compute a marginal likelihood estimate - essentially a goodness-of-fit which takes into account the complexity of the models. The ratio of these provides a Bayes factor - a measure of the relative ‘plausibility’ of the models given the data.&lt;/p&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;&lt;strong&gt;model&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;log MLE&lt;/strong&gt;&lt;/th&gt;
      &lt;th&gt;&lt;strong&gt;log Bayes factor&lt;/strong&gt;&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;strict clock&lt;/td&gt;
      &lt;td&gt;-33661.96&lt;/td&gt;
      &lt;td&gt;—&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;clade local clock&lt;/td&gt;
      &lt;td&gt;-33566.74&lt;/td&gt;
      &lt;td&gt;95.22 (vs. strict clock)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;stem local clock&lt;/td&gt;
      &lt;td&gt;-33563.86&lt;/td&gt;
      &lt;td&gt;2.88 (vs. clade local clock)&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;UCLN relaxed clock&lt;/td&gt;
      &lt;td&gt;-33539.62&lt;/td&gt;
      &lt;td&gt;24.24 (vs. stem local clock)&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 2&lt;/strong&gt; Marginal likelihood estimates (MLE) and Bayes factors for the models discussed here. The 3rd column gives the difference between the log MLE for the model and the model above which is the log Bayes factor for the two.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The uncorrelated lognormal relaxed clock (UCLN) is the best fitting model (by a good margin) as it clearly accommodates some of the ‘slow-down’ in the two stem branches but also other variation in rate across the tree (Figure 4). However, the time and rate estimates are very variable.&lt;/p&gt;

&lt;p&gt;The good fit of the UCLN model suggests there is random variation in rate across the tree as well as the specific ‘latency’ slow downs. So we constructed a model that is a mix of the stem local clock and the UCLN — this essentially states that the two stem branches have their own rate and the rates for the rest of the tree are drawn from the lognormal relaxed clock (Figure 11).&lt;/p&gt;

&lt;figure style=&quot; width: 100%;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/EBOV_Reference_Set_15_LC2+UCLN.MCC.png&quot; alt=&quot;&quot; style=&quot;max-width: 100%&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 11.&lt;/strong&gt; MCC tree constructed for the mix of the stem local clock model and the UCLN relaxed clock.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Over all this tree is quite similar to the straight UCLN one (Figure 4) but with much tighter credible (HPD) intervals on the node ages. This suggests overall better model fit (less of a struggle to fit competing patterns of rate variation). Indeed the MLE estimate for this model is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;-33536.59&lt;/code&gt; giving a log Bayes factor of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;3.03&lt;/code&gt; (more than 20-fold) over the UCLN model. The rates are comparable (Figure 12) but as expected the addition of the relaxed clock gives more variation in these.&lt;/p&gt;

&lt;figure style=&quot; width: 450px;&quot;&gt;&lt;img class=&quot;docimage&quot; src=&quot;images/news/Local_rates_LC2_LC2+UCLN.png&quot; alt=&quot;&quot; style=&quot;max-width: 450px&quot; /&gt;&lt;/figure&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Figure 12.&lt;/strong&gt; The rates for the tree and the two stem branches under the stem local clock model (left) and the stem local clock + relaxed clock model (right).&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;final-points&quot;&gt;Final points&lt;/h3&gt;

&lt;p&gt;Although we forced the rooting of the tree to be the same for each model, it is likely that the strict clock model and the relaxed clock model would give a different rooting (and possibly rates) if the constraint was removed.&lt;/p&gt;

&lt;p&gt;Finally, we are developing an explicit model of latency which will act as a molecular clock model, infer the branches that have evidence of latency and estimate parameters of the process. More on this soon.&lt;/p&gt;

&lt;h3 id=&quot;references&quot;&gt;References&lt;/h3&gt;

&lt;blockquote&gt;
  &lt;p&gt;Baize, S. et al., 2014. Emergence of Zaire Ebola Virus Disease in Guinea. The New England journal of medicine, 371(15), pp.1418–1425.&lt;/p&gt;

  &lt;p&gt;Carroll, S.A. et al., 2013. Molecular Evolution of Viruses of the Family Filoviridae Based on 97 Whole-Genome Sequences. Journal of virology, 87(5), pp.2608–2616.&lt;/p&gt;

  &lt;p&gt;Diallo, B. et al., 2016. Resurgence of Ebola Virus Disease in Guinea Linked to a Survivor With Virus Persistence in Seminal Fluid for More Than 500 Days. Clinical infectious diseases: an official publication of the Infectious Diseases Society of America, 63(10), pp.1353–1356.
Grard, G. et al., 2011. Emergence of divergent Zaire ebola virus strains in Democratic Republic of the Congo in 2007 and 2008. The Journal of infectious diseases, 204 Suppl 3, pp.S776–84.&lt;/p&gt;

  &lt;p&gt;Lam, T.T.-Y. et al., 2015. Puzzling origins of the Ebola outbreak in the Democratic Republic of the Congo, 2014. Journal of virology, pp.JVI.01226–15. &lt;a href=&quot;https://doi.org/10.1128/JVI.01226-15&quot;&gt;https://doi.org/10.1128/JVI.01226-15&lt;/a&gt;&lt;/p&gt;

  &lt;p&gt;Maganga, G.D. et al., 2014. Ebola Virus Disease in the Democratic Republic of Congo. The New England journal of medicine, 371(22), pp.2083–2091.&lt;/p&gt;

  &lt;p&gt;Mbala-Kingebeni, P., Pratt, C.B., et al., 2019. 2018 Ebola virus disease outbreak in Équateur Province, Democratic Republic of the Congo: a retrospective genomic characterisation. The Lancet infectious diseases. &lt;a href=&quot;http://dx.doi.org/10.1016/S1473-3099(19)30124-0&quot;&gt;http://dx.doi.org/10.1016/S1473-3099(19)30124-0&lt;/a&gt;.&lt;/p&gt;

  &lt;p&gt;Mbala-Kingebeni, P., Aziza, A., et al., 2019. Medical countermeasures during the 2018 Ebola virus disease outbreak in the North Kivu and Ituri Provinces of the Democratic Republic of the Congo: a rapid genomic assessment. The Lancet infectious diseases. &lt;a href=&quot;http://dx.doi.org/10.1016/S1473-3099(19)30118-5&quot;&gt;http://dx.doi.org/10.1016/S1473-3099(19)30118-5&lt;/a&gt;.&lt;/p&gt;

  &lt;p&gt;Nsio, J. et al., 2019. 2017 Outbreak of Ebola Virus Disease in Northern Democratic Republic of Congo. The Journal of infectious diseases. &lt;a href=&quot;https://http://doi.org/10.1093/infdis/jiz107&quot;&gt;https://http://doi.org/10.1093/infdis/jiz107&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
</description>
            <pubDate>Thu, 16 May 2019 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/ebov_local_clocks.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/ebov_local_clocks.html</guid>
            
            <category>article</category>
            
            
        </item>
        
        <item>
            <title>Measuring BEAST performance</title>
            <description>&lt;p&gt;When running BEAST it reports the time taken to calculate a certain number of states (e.g., minutes/million states). It is obviously tempting to compare this time between runs as a measure of performance.
However, unless you are testing the performance of the &lt;em&gt;same XML file&lt;/em&gt; on different hardware or for different parallelization options, this will never be a reliable measure and may lead you astray.&lt;/p&gt;

&lt;p&gt;The MCMC algorithm in BEAST picks operators (transition kernels or ‘moves’) from the list of potential operators proportional
to their given &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt;. 
Some operators change a single parameter value, some change multiple parameters and others will alter the tree. 
BEAST tries to only recalculate the likelihood of the new state for the bits of the state that have changed. 
Thus some operators will only produce a modest amount of recomputation (e.g., changing a bit of the tree may only require the likelihood at a few nodes to be recalculated) whereas others will require a lot of computation (e.g., changing the evolutionary rate will require the recalculation of absolutely everything). 
Thus if the computationally heavy operators are given more &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt; then the average time per operation over the course of the chain will go up. 
But this is not necessarily a bad thing.&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot; role=&quot;alert&quot;&gt;&lt;i class=&quot;fa fa-info-circle&quot;&gt;&lt;/i&gt; &lt;b&gt;Note:&lt;/b&gt; This posting is primarily about improving the statistical performance of BEAST irrespective of the hardware being used. For a discussion of improving the computational performance on various types of hardware, &lt;a href=&quot;performance&quot;&gt;see this page&lt;/a&gt;.&lt;/div&gt;

&lt;h3 id=&quot;efficient-sampling-and-esss&quot;&gt;Efficient sampling and ESSs&lt;/h3&gt;

&lt;p&gt;The ultimate aim of an MCMC analysis is to get the maximum amount effectively independent samples from the posterior as possible (as measured by effective sample size, ESS). 
Ideally, we would aim to get the same ESS for all parameters in the model but we are often less interested in some parameters than others and we could allow a lower ESS for those. 
A high ESS is more important the more we are interested in the tails of the distribution of a parameter. 
So some parameters, in particular those that are part of the substitution model such as the transition-transversion ratio &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt; are down weighted. 
Changing these is computationally expensive, requiring a complete recalculation of the likelihood for the partition, but we are rarely interested in the value. 
We simply want to marginalize our other parameters over their distributions. 
So we can accept a lower ESS for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt; as the cost of focusing on other parameters.&lt;/p&gt;

&lt;div class=&quot;alert alert-info&quot; role=&quot;alert&quot;&gt;&lt;i class=&quot;fa fa-info-circle&quot;&gt;&lt;/i&gt; &lt;b&gt;Note:&lt;/b&gt; In most cases, substitution model parameters easily achieve high ESS values, which is why they are typically updated less often than for example clock and coalescent model parameters.&lt;/div&gt;

&lt;p&gt;To demonstrate this we can look at an example BEAST run. 
This is a data set of 62 carnivore mitochondrial genome coding sequences giving a total of about 5000 site patterns. 
The model was an &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;HKY+gamma&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;strict molecular clock&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;constant size coalescent&lt;/code&gt; (&lt;a href=&quot;/files/carnivores.HKYG.SC.CPC.classic.xml.zip&quot;&gt;the XML file is available here&lt;/a&gt;).
The data was run on &lt;a href=&quot;installing&quot;&gt;BEAST v1.10.4&lt;/a&gt; on an Dell server for 10M steps for a total run time of &lt;strong&gt;4.08 hours&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Data: Carnivores mtDNA 62 taxa, 10869bp, 5565 unique site patterns&lt;/p&gt;

  &lt;p&gt;Model: HKY+G, Strict clock, Constant size coalescent&lt;/p&gt;

  &lt;p&gt;Machine: Dell Precision 3.10GHz Intel Xeon CPU E5-2687&lt;/p&gt;

  &lt;p&gt;XML file: &lt;a href=&quot;/files/carnivores.HKYG.SC.CPC.classic.xml.zip&quot;&gt;carnivores.HKYG.SC.CPC.classic.xml&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can see the effect of different operators by looking at the operator table reported at the end of the run:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 1&lt;/strong&gt;&lt;/p&gt;
  &lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Operator                                          Tuning  Count      Time     Time/Op  Pr(accept) 
scale(kappa)                                      0.913   92847      548576   5.91     0.2322      
frequencies                                       0.01    92528      547586   5.92     0.2355      
scale(alpha)                                      0.939   92835      548560   5.91     0.2317      
scale(nodeHeights(treeModel))                     0.927   277983     1651622  5.94     0.2329      
subtreeSlide(treeModel)                           0.013   2778288    2415749  0.87     0.2315      
Narrow Exchange(treeModel)                                2778497    1936405  0.7      0.0091      
Wide Exchange(treeModel)                                  277044     206821   0.75     0.0002      
wilsonBalding(treeModel)                                  277574     355560   1.28     0.0002      
scale(treeModel.rootHeight)                       0.262   277733     72405    0.26     0.2391      
uniform(nodeHeights(treeModel))                           2777648    2650767  0.95     0.1207      
scale(constant.popSize)                           0.474   277023     2697     0.01     0.2375      
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;  &lt;/div&gt;
&lt;/blockquote&gt;

&lt;p&gt;The operators on the substitution model (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;frequencies&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;alpha&lt;/code&gt;) are amongst the most computationally expensive taking on average 5.9 milliseconds per operation. 
Although they are not selected very often relative to the others (only about 3% of the time) their total contribution to the runtime is over 15% of the total.
On the other hand the 7 operators that alter the tree generally have low cost (about 0.4 milliseconds per operation) and making up 85% of the total runtime because they are picked 95% of the time. 
The population size parameter is very cheap so comprises a tiny fraction of runtime even though it is called quite a lot.&lt;/p&gt;

&lt;p&gt;By default the operators on each of the sustitution model parameters have a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt; of 1, the sum of all tree operators has a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt; of 102 and the population size operator has a &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt; of 3  (see the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;operators&lt;/code&gt; panel in BEAUti for the weights for each operator).&lt;/p&gt;

&lt;h3 id=&quot;mixing&quot;&gt;Mixing&lt;/h3&gt;

&lt;p&gt;A further complication is that different choices of operators, priors etc., can effect the efficiency of &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;mixing&lt;/code&gt; of the MCMC (how fast it converges and explores the parameter space). 
This is reflected in a higher ESS perhaps even at the cost of more computation per step - what matters is that the gain in ESS is proportionally higher than the computational cost.&lt;/p&gt;

&lt;p&gt;If we load the resulting log file into &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Tracer&lt;/code&gt; we can calculate the ESS for these parameters (with a 10% burnin removed):&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 2&lt;/strong&gt;&lt;/p&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;Parameter&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;mean value&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;kappa&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;27.18&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1503&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;frequencies1&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;0.390&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1954&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;frequencies2&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;0.305&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2590&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;frequencies3&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;0.082&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2927&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;frequencies4&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;0.223&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2286&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;alpha&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;0.235&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;4355&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;constant.popSize&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1.997&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;8617&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeModel.rootHeight&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;0.506&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1790&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLength&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;8.146&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1620&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLikelihood&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;-1.93E5&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;3944&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;You can see that all of the ESSs are quite high. The two parameters that relate to the tree, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeModel.rootHeight&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeLength&lt;/code&gt; (the sum of all the branch lengths - not technically a parameter but a metric) show ESSs of &amp;gt;1000.
These values are not necessarily indicative of how well the tree is mixing overall so we cal also look at the ‘ESS’ for the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;likelihood&lt;/code&gt; (the likelihood of the data given the tree). 
This is a probability density not a parameter but looking at how (un)correlated the values are will be another indication of how well the tree has been mixing.&lt;/p&gt;

&lt;p&gt;The ESSs for the substitution model parameters are high suggesting that we could afford to down-weight their operators to reduce their contribution to the total runtime.&lt;/p&gt;

&lt;h3 id=&quot;optimising-efficiency&quot;&gt;Optimising efficiency&lt;/h3&gt;

&lt;p&gt;To measure the overall efficiency of BEAST – i.e., the number of independent samples being generated per unit time (or per kWh of electricity) – it is probably best to consider ESS/hour for the parameters of interest.&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 3&lt;/strong&gt;&lt;/p&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;Parameter&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS/hour&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;kappa&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1503&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;368&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;constant.popSize&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;8617&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2109&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeModel.rootHeight&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1790&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;438&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLength&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1620&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;396&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLikelihood&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;3944&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;965&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;Focusing on &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt; as a representitive of the substitution model, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;rootHeight&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeLength&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeLikelihood&lt;/code&gt; to represent the tree and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;constant.popSize&lt;/code&gt; the coalescent model, wr can calculate the ESS/hour for the above run.&lt;/p&gt;

&lt;p&gt;If we reduce the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt; of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;alpha&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;frequencies&lt;/code&gt; operators by a factor of 10 (this can be done in BEAUti’s operator table or by editing the XML), the total runtime goes down to &lt;strong&gt;3.73 hours&lt;/strong&gt; – about a 10% saving.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Which is nice.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The ESSs for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt; (and the other down-weighted operators) predictably goes down but is still reasonable:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 4&lt;/strong&gt;&lt;/p&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;Parameter&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS/hour&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;kappa&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;515&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;138&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;constant.popSize&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;9001&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2414&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeModel.rootHeight&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1090&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;292&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLength&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;922&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;247&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLikelihood&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2793&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;749&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;Note that the ESSs for &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeModel.rootHeight&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeLength&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeLikelihood&lt;/code&gt; have also gone down (but not by as greater degree as &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;kappa&lt;/code&gt;) and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;constant.popSize&lt;/code&gt; has actually gone up in ESS (to the maximum where every sample is independent). 
So by down-weighting the substitution model operators we have reduced the ESS/hour across the board (with the exception of the coalescent prior).
It is still possible that the tree topology is mixing better but we aren’t measuring that directly.&lt;/p&gt;

&lt;p&gt;We could look at reducing the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;weight&lt;/code&gt; of the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;constant.popSize&lt;/code&gt; operator by a factor of 3 (returning the substitution model operators back to their original weights). 
The total run time goes up to &lt;strong&gt;4.16 hours&lt;/strong&gt; because we are doing fewer cheap moves and more expensive ones – but the ESS/hour for all the other parameters goes up:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 5&lt;/strong&gt;&lt;/p&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;Parameter&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS/hour&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;kappa&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1708&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;410&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;constant.popSize&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;6992&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1679&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeModel.rootHeight&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1751&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;421&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLength&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1812&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;435&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLikelihood&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;4219&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1013&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/blockquote&gt;

&lt;h3 id=&quot;operator-acceptance-rates&quot;&gt;Operator acceptance rates&lt;/h3&gt;

&lt;p&gt;One other thing to note here is the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Pr(accept)&lt;/code&gt; column in the operator analysis, &lt;strong&gt;Table 1&lt;/strong&gt;, above. 
This records how often a proposed operation is actually accepted according to the Metropolis-Hastings algorithm. 
A rule of thumb is that a move should be accepted about 23% of the time to be optimally efficient (this is an analytical result for certain continuous moves but we assume it also approximately applies for tree moves). 
Operators are generally ‘tuned’ to achieve this ratio by adjusting the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;size&lt;/code&gt; of the move (how big a change is made to the parameter – big moves will be accepted less often than small ones). 
Some moves (&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Narrow Exchange&lt;/code&gt;, &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;Wide Exchange&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;WilsonBalding&lt;/code&gt;) are not tunable and you can see they have a very small acceptance probability. 
This means they are acting inefficiently at exploring the tree-space but consume considerable computational time. 
On the other hand they may be important for convergence initially where large moves are favoured.&lt;/p&gt;

&lt;p&gt;We can try reweighting these operators down by a factor of 10 and see the effect.&lt;/p&gt;

&lt;p&gt;Firstly the total runtime is &lt;strong&gt;4.33 hours&lt;/strong&gt; – more than 6% slower than our original run. 
However, if we look at the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ESS&lt;/code&gt; and &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ESS/hour&lt;/code&gt; values:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;strong&gt;Table 6&lt;/strong&gt;&lt;/p&gt;

  &lt;table&gt;
    &lt;thead&gt;
      &lt;tr&gt;
        &lt;th&gt;Parameter&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS&lt;/th&gt;
        &lt;th style=&quot;text-align: right&quot;&gt;ESS/hour&lt;/th&gt;
      &lt;/tr&gt;
    &lt;/thead&gt;
    &lt;tbody&gt;
      &lt;tr&gt;
        &lt;td&gt;kappa&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2455&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;567&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;constant.popSize&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;7138&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1650&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeModel.rootHeight&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2586&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;598&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLength&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;2719&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;628&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
        &lt;td&gt;treeLikelihood&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;4873&lt;/td&gt;
        &lt;td style=&quot;text-align: right&quot;&gt;1126&lt;/td&gt;
      &lt;/tr&gt;
    &lt;/tbody&gt;
  &lt;/table&gt;
&lt;/blockquote&gt;

&lt;p&gt;We are generally doing much better than before with the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;ESS/hour&lt;/code&gt; up over the previous runs (the only looser is &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;constant.popSize&lt;/code&gt; but it is still higher than all the others).&lt;/p&gt;

&lt;h3 id=&quot;concluding-remarks&quot;&gt;Concluding remarks&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;Don’t use time/sample as a comparative measure of performance for different data or sampling regimes.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;A better measure of BEAST performance than the average time per million steps would be the average time per effectively independent sample (i.e., ESS/hour). 
In the example above, the &lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;treeLength&lt;/code&gt; measure goes from 396 independent values per hour to 628, nearly doubling.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Choosing operator weights to achieve better performance (as ESS/hour) is a difficult balancing act and may need multiple runs and examination of operator analyses and ESSs. 
It may usually be better to be conservative about these and worry about getting statistically correct results more than saving a few hours of runtime.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Because of the stochastic nature of the algorithm BEAST can be variable from run to run both in the total total runtime (because of variability in the operators picked and their computational cost) and the ESS of parameters. 
The run time will also depend on what else the computer is doing at the same time (these results were done on a many core machine with nothing else of significance running).&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;The optimal weights for operators will also vary considerably by data set meaning it is difficult to come up with reliable rules.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;We are currently working on improving the operators and weights to achieve a reliable increase in statistical performance. More on this soon …&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;alert alert-info&quot; role=&quot;alert&quot;&gt;&lt;i class=&quot;fa fa-info-circle&quot;&gt;&lt;/i&gt; &lt;b&gt;Note:&lt;/b&gt; The operator weights that BEAUti generates by default are intended to be robust (we want to try to ensure convergence) and may not be optimal in all circumstances. Adjustment of these might achieve significant improvements in ESS/hour but caution should be exercised and the results examined closely to ensure that that convergence has been achieved. As always we strongly recommend that at least 2 replicate runs are performed and the results compared.&lt;/div&gt;

</description>
            <pubDate>Sat, 17 Nov 2018 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/measuring-beast-performance.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/measuring-beast-performance.html</guid>
            
            <category>article</category>
            
            
        </item>
        
        <item>
            <title>BEAST v1.10.4 released</title>
            <description>&lt;h3 id=&quot;we-are-pleased-to-announce-the-release-of-beast-v1104&quot;&gt;We are pleased to announce the release of BEAST v1.10.4&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BEAST v1.10.4&lt;/code&gt; fixes a bug when trying to specify a burnin on the command line version of LogCombiner. It also introduces two new command line options specific to BEAGLE v3.1 (-beagle_threading_off and -beagle_thread_count).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;installing&quot;&gt;Download BEAST v1.10.4 binaries for Mac, Windows and UNIX/Linux&lt;/a&gt;&lt;/p&gt;

</description>
            <pubDate>Wed, 14 Nov 2018 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/2018-11-14_BEAST_v1.10.4_released.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/2018-11-14_BEAST_v1.10.4_released.html</guid>
            
            <category>news</category>
            
            
        </item>
        
        <item>
            <title>BEAST v1.10.3 released</title>
            <description>&lt;h3 id=&quot;we-are-pleased-to-announce-the-release-of-beast-v1103&quot;&gt;We are pleased to announce the release of BEAST v1.10.3&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BEAST v1.10.3&lt;/code&gt; fixes an important bug where performance was degraded when using BEAGLE 3 on CPUs (compared with using BEAGLE 2 or using GPUs).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;installing&quot;&gt;Download BEAST v1.10.3 binaries for Mac, Windows and UNIX/Linux&lt;/a&gt;&lt;/p&gt;

</description>
            <pubDate>Sun, 28 Oct 2018 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/2018-10-28_BEAST_v1.10.3_released.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/2018-10-28_BEAST_v1.10.3_released.html</guid>
            
            <category>news</category>
            
            
        </item>
        
        <item>
            <title>BEAST v1.10.0 released</title>
            <description>&lt;h3 id=&quot;we-are-pleased-to-announce-the-release-of-beast-v110&quot;&gt;We are pleased to announce the release of BEAST v1.10&lt;/h3&gt;

&lt;p&gt;&lt;code class=&quot;language-plaintext highlighter-rouge&quot;&gt;BEAST v1.10.0&lt;/code&gt; is a major new version with many new features which focus on flexibility of model specification, 
integration of different data sources, and increasing the speed and efficiency of sampling.&lt;/p&gt;

&lt;p&gt;The new version coincides with the publication of a paper describing many of the new features:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Suchard MA, Lemey P, Baele G, Ayres DL, Drummond AJ &amp;amp; Rambaut A (2018) Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 &lt;em&gt;Virus Evolution&lt;/em&gt; &lt;strong&gt;4&lt;/strong&gt;, vey016. &lt;a href=&quot;https://doi.org/10.1093/ve/vey016&quot;&gt;DOI:10.1093/ve/vey016&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href=&quot;installing&quot;&gt;Download BEAST v1.10.0 binaries for Mac, Windows and UNIX/Linux&lt;/a&gt;&lt;/p&gt;

</description>
            <pubDate>Sun, 10 Jun 2018 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/2018-06-10_BEAST_v1.10.0_released.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/2018-06-10_BEAST_v1.10.0_released.html</guid>
            
            <category>news</category>
            
            
        </item>
        
        <item>
            <title>BEAST v1.8.4 released</title>
            <description>&lt;p&gt;BEAST v1.8.4 has been released:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;installing.html&quot;&gt;Download BEAST v1.8.4 binaries for Mac, Windows and UNIX/Linux&lt;/a&gt;&lt;/p&gt;

&lt;div class=&quot;language-plaintext highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;Version 1.8.4 released 17th June 2016
New Features:

    New structured list of citations printed to screen before running.
    Option (&apos;-citation_file&apos;) to write citation list to file.
    Option in BEAUti Priors panel to set parameters to &apos;Fixed Value&apos;

Bug Fixes:

    Issue 808: Set autoOptimize to false in the randomWalkOperator on 
               Pagel&apos;s lambda
    Issue 806: SRD06 in BEAUTi selecting incorrect options.
    Issue 799: Relative rate parameters for partitions were not being 
               created. All partitions within a clock model have a 
               relative rate if their substitution models are unlinked.
    Issue 798: Calculating pairwise distances was slow for big data sets -
               removed this (but initial values no longer suggested based
               on data).
    Issue 797: Removed &apos;meanRate&apos; from Priors tab in BEAUti.
    Issue 794: Running with empty command line causes error.
    Issue 792: Check to see that the same likelihood isn&apos;t included multiple
               times into the density.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

</description>
            <pubDate>Fri, 17 Jun 2016 00:00:00 +0000</pubDate>
            <link>http://github.com/beast-dev/2016-06-17_BEAST_v1.8.4_released.html</link>
            <guid isPermaLink="true">http://github.com/beast-dev/2016-06-17_BEAST_v1.8.4_released.html</guid>
            
            <category>news</category>
            
            
        </item>
        
    </channel>
</rss>
