Vijini Mallawaarachchi
2 min readMar 27, 2020

--

In reality, blood samples are obtained from patients in these kinds of analysis and after library preparation, these samples are sequenced. We have to determine virus sequences from human sequences in the sequencing data we get. Since virus sequences have very low abundance within samples compared to human sequences, we have to sequence at large depths in order to obtain a significant number of virus sequences.

If you download and see the size of the reads dataset provided for 2019 novel coronavirus, it is over 21 GB (FASTQ data) with 56,565,928 reads. You cannot assemble this dataset in a normal notebook or desktop computer with 16 GB RAM. The assembler I have used (SPAdes) makes use of the de Bruijn graph approach which builds a massive graph of k-mers obtained from all the reads and requires large amounts of memory [1]. 16 GB of memory won’t be sufficient for these computations (traditional short-read assemblers require around 256 GB RAM for datasets with roughly 500 million reads [1,2]). I could not assemble this dataset on my desktop machine. I had to use the resources of a supercomputing facility to assemble this dataset. These are very resource-intensive tasks. So in my opinion, assembling and determining the coronavirus genome is indeed a huge achievement in modern science which can help greatly to tackle a worldwide pandemic.

[1] Schatz MC, Delcher AL, Salzberg SL (2010) Assembly of large genomes using second-generation sequencing. Genome Res 20: 1165–1173. doi:10.1101/gr.101360.109.

[2] Kleftogiannis, D., Kalnis, P., and Bajic, V. B. (2013). Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures. PLoS ONE 8:e75505. doi: 10.1371/journal.pone.0075505

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Vijini Mallawaarachchi
Vijini Mallawaarachchi

Written by Vijini Mallawaarachchi

Bioinformatician | Computational Genomics 🧬 | Data Science 👩🏻‍💻 | Music 🎵 | Astronomy 🔭 | Travel 🎒 | vijinimallawaarachchi.com

Responses (1)

Write a response