Jabberwocky
Frumious Bandersnatch
this is completely false- i consider it bad science to publish computational results without accompanying experimental validation. i came from theroretical physics (but actually computer science, long story) where they are really antsy about it, though in that case analytical proof is also expected, because of a number of high profile retractions over the years. i don't trust computational results without complementary evidence and its why i'm wary of your claims about this weird phylogeny- phylogeny is well known to be dodgy. i have never published without experimental verification of my results, the role of computing in my view is to reduce the burden on experimentalists, i can point them in the right direction and together we achieve results that would be impossible without computational input.As I said earlier in science it is a good idea to take nothing at face value and take no-ones' word, Nullis in verba.
Ideas stand or fall on the evidence, which is why I tend to be leery of the bio-proteomics-metabolomics-omics informatics revolution. on the whole it falls short of meeting the threshold for reproducible, repeatable, testable science. The most interesting feature of nu-science is the lack of supporting evidence, the lack of critical peer review and lack of engagement with people asking difficult questions like so..? and...? why..? response is Computer says yes. Model says yes. That is why I asked for your comment on the sequencing field where you are comfortable.
the results of this collaboration between bio and computer scientists speak for themselves, but i do agree as i said in a previous post, a lot of dodgy shit is done. unfortunately, this is because a lot of biologists who lack even basic mathematical skills, let alone complexity theory, think they can design and implement algorithms themselves, with no formal engineering tuition or experience. obviously i will defend my own profession but the things i've seen in code bases since moving to biosciences makes my skin crawl. thankfully people are wising up to it and people such as myself are able to find gainful and interesting employment as a result.
back to the sequencing:
I don't think using R is significant per se, R is commonly used all over the place because of the bundled stats and graphing features. Yes it would be more computationally efficient not to use R but the end rsult should be the same. What I am more interested in is your comment about how the sequencing was performed, from what I can determine it could be done using many different methods as CoG use rapid sequencing and many flavors of techniques in different labs, the CoG statement looks like hand waving.
My understanding is the depth of the sequence overlap and the reliability of the technique is key yet so far these data are not obviously public. Remember the variant was sequenced in early October from the MK sample, so at least 60 days ago. I am much more interested in how robust this all is, sequencing and tracking lineages has the potential to give useful information but knowing how fuzzy the info is would be very useful, no point knowing anything without knowing the error.
But if we go ahead with the assumption the sequence and genetic distance is correct then a plausible explanation of how this could happen is needed, the thing went stealth for a long time and re-appeared with a lot of nice new mutations, if this is something that can happen again and again then this mechanism may be a big deal.
the use of R signifies the plotting was probably done as part of an off the shelf package (as in part of bioconductor, i prefer seaborn which is also an off the shelf package but for a real programming language with a proper type system!). i said earlier there will be no good off the shelf package for this type of analysis, especially in light of the your suggestion that many different library prep and sequencing modalities were used to generate the data. usually packages are specific to one type of data, illumina, nanopore, illumina, pacbio, illumina, sanger cos it was written in 1996, illumina, illumina but claiming it can work for all sequencing modalities. anyway its possible that the data plotted were obtained using bespoke code so its just speculation, but when i see a graph plotted in R i can be pretty sure whoever plotted it doesn't know much about programming.
a possible explanation of how this could occur is needed. it really is too early to tell. until they have published their data and methods its not possible for an outside to suggest a plausible explanation, so the one mentioned in the paper is our best guess. it almost certainly can happen again and again and that is indeed a concern. thankfully if they change the spike protein too much it will no longer be able to bind to receptors, and hopefully the vaccine should work with more minor modifications of the spike protein.
@JGrimez quoting VAERS really merits no response. anyone can publish to it without demonstrating that the adverse event is linked to the vaccination.
the principle of vaccination has been around for hundreds of years, longer than the modern gold standard for clinical trials.