• S&T Moderators: VerbalTruist | Skorpio | alasdairm

Biology most complete human genome yet

Skorpio

Sr. Moderator: N&PD, S&T
Staff member
Joined
May 11, 2011
Messages
3,428
article:


tbh i was surprised that we hadn't done this yet. when i was working in genome assembly 5 years ago we considered human complete from that perspective, diploid highly homozygous (i.e. our chromosome pairs match each other at the vast majority of bases) aren't that difficult. and human has a lot of money thrown at it compared to other organisms cos of the medical applications of human genome research.

doesn't surprise me that telomeres and centromeres were the problem regions cos bits of DNA that are just like tatatatatatatatatatatatatatatatata for tens of thousands of bases confound sequencers and software alike. not surprised ONT sequencing was used, though they don't actually specify i'm not sure there are any other realistic candidates for the nanopore sequencing they mention.

am i surprised that the next subject for a high quality whole genome sequence is another white male? no. do fucking better biology.

the human pangenome project, capturing a significant proportion of the variation within the human genome, will be really interesting. as long as they remember that some humans are not white men.
What are your thoughts on nanopore versus other ngs technologies (I remember seeing one that uses a slowed down polymerase and each nucleotide is fluroescently tagged getting fairly long reads off of 1 molecule at the drug company Regeneron's campus).

I only have practical experience with sanger, and am decidedly not immersed in omics.
 
What are your thoughts on nanopore versus other ngs technologies (I remember seeing one that uses a slowed down polymerase and each nucleotide is fluroescently tagged getting fairly long reads off of 1 molecule at the drug company Regeneron's campus).
i don't really know how they work beyond illumina being some sort of optical sequencing, whereas nanopore is based on the voltage change each base causes as its passed through.

in terms of actually using the data, they all just produce bases at the end of the day. the difference between technologies is the length of the sequences you get and the confidence you get that each bse has been called correctly.

we used exclusively illumina when i worked in assembly, where you need to be extremely confident that each base is correct, you can get good results from relatively short sequences (standard for illumina is 250 bp) by building clever algorithms and technologies that get you longer range information. paired end reads give you pairs of reads that are separated by an easily deducible gap, same goes for long mate pairs where if you have really good lab technicians you can get reads that jump over long repetitive sequences such as retrotransposons. you need to be able to bridge them to get to the unique content at each end to be able to work out which bits of the genome at each end of a retrotransposon goes where. technologies such as 10x let you barcode regions of several hundred kb, giving you even longer range info from short reads.

then we have nanopore. this is typically much longer reads 1k-10k bases easily. at the cost of errors. traditionally, the error rate in ONT reads was much, much higher, 20% when i first heard of it, 10% recently, but now they have only gone and produced new chemistry kits that enable them to get similar qualities to illumina. you can get into the 100s of Kbs with nanopore if you are lucky. so suddenly they have gone from 'v cheap and OK for getting a rough idea of whats going on' to rivalling illumina, the workhorse of bioinformatics, for quality, smashing them on length and price.

things like certain viruses, for example influenza, with short segmented genomes, you can basically get the full genome very quickly with a high degree of confidence. even shorter RNA viruses like polio you probably don't have to run your sequencer for too long before you have a few full copies of its genome. no assembly required.

if i had any money i would invest in ONT. if i had shares in illumina, i'd sell them and put the money in ONT.

for human or other eukaryotes, the difference between 250bp reads and unbounded length now we have better quality should make things much easier.
 
this might be relevant to anyone who, like me, heard this news and said 'ooohhhhh centromeres, fuck all happens there so why should we care:'


turns out that HPV likes to integrate there. not sure what that means though, i.e. whether it contributes to the HPV viruses' disruption of the cell cycle and thus plays a role in causing cancer. the paper mentions a more indirect role: "nterestingly, we noted an enrichment of integrations in most centromeres, suggesting a possible mechanism where HPV exploits this structural machinery to facilitate integration."
 
Top