Next Generation Sequencing (NGS) has permanently transformed genetics and molecular biology, enabling a deeper understanding of DNA and RNA.
Since its introduction, NGS has far surpassed the capabilities of traditional sequencing methods, known as Sanger sequencing. Unlike earlier methods, NGS allows for the simultaneous sequencing of millions of DNA fragments, significantly reducing time and costs while increasing the amount of data generated.
NGS technology is essential for a wide range of applications, from whole genome sequencing to more targeted studies like exome sequencing, transcriptomics, epigenomics, and more.
This article will guide you through the basic concepts, current technologies, applications, and challenges of NGS, with a practical approach based on our own experience.
Main NGS Sequencing Technologies
Sequencing by Synthesis (Illumina)
Sequencing by synthesis is the most commonly used method in NGS, dominated by Illumina platforms. This technology uses a process called "sequencing by synthesis," where fluorescent nucleotides are incorporated into a growing DNA strand and detected by a camera that captures each step of the process.
Workflow:
First, the sample DNA is fragmented and specific adapters are added that serve three key functions:
Attachment:Â Allow DNA fragments to adhere to the surface of the flow cell during sequencing.
Amplification:Â Facilitate the local amplification of each DNA fragment in the cell, creating clusters of identical fragments.
Sequencing:Â Contain sequences that serve as binding sites for sequencing primers. Each synthesis cycle adds a nucleotide to the growing strand, emitting a fluorescent signal that is recorded.
Advantages:
High accuracy, scalability, and the ability to generate a large amount of data. Illumina is the preferred option for whole genome sequencing projects and studies of genetic variation.
Limitations:
The main limitation is the read length, which is usually short (between 50 and 300 base pairs). This can make it difficult to resolve repetitive or complex regions of the genome.
Semiconductor Sequencing (Ion Torrent)
Workflow:
Similar to Illumina, but instead of using fluorescent signals, this technology measures pH changes that occur during nucleotide incorporation.
Advantages:
Speed and low cost, suitable for applications where time is a critical factor, such as clinical sequencing.
Limitations:
Lower accuracy compared to Illumina, especially in detecting complex genetic variations.
Third-Generation Technologies: PacBio and Oxford Nanopore
Third-generation technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore, represent a significant advance in sequencing, allowing much longer reads without the need for amplification.
PacBio:
Uses real-time sequencing (SMRT), where a DNA molecule passes through a polymerase attached to a surface, allowing ultra-long read lengths (up to 20 kb or more). Ideal for studying structural variations, repetitive regions, and de novo genome sequencing.
Oxford Nanopore:
This technology allows the sequencing of individual DNA molecules as they pass through a protein pore. It is highly portable and allows real-time sequencing with variable read lengths.
Advantages: Long reads that resolve complex genome regions, useful for whole genome sequencing, haplotype studies, and structural variation analysis.
Limitations: High costs and higher error rates compared to second-generation technologies, as they are the most modern and advanced solutions.
NGS Workflow
Sample Collection and Preparation
The NGS process begins with the collection of biological samples, which can include DNA, RNA, or both, depending on the study's objectives. Samples can come from various sources, such as blood, tissue, cultured cells, or even environmental samples..
Importance of Sample Quality:Â The integrity of genetic material is critical to ensuring the quality of the data obtained. DNA degradation or contamination can affect the results or prevent the analysis from being performed.
Considerations for RNA:Â In the case of RNA sequencing (RNA-Seq), it is important to select high-quality RNA and eliminate any potential genomic DNA contamination.
Library Preparation: Key Concepts and Procedures
Library preparation is one of the most critical steps in NGS. A library is a collection of DNA fragments prepared for sequencing, each with specific adapters ligated at both ends.
DNA Fragmentation:Â DNA is fragmented into small pieces, typically between 200 and 600 base pairs, using enzymatic or mechanical methods. These fragments are then purified and prepared for the next step.
Adapter Addition:Â Adapters are synthetic DNA sequences that are ligated to the ends of DNA fragments. These adapters are essential for the binding of fragments to the flow cell surface and for subsequent amplification and sequencing.
PCR Amplification:Â In some cases, DNA fragments are amplified by PCR to increase the amount of DNA available for sequencing. However, this step can introduce biases and artifacts, so it must be handled carefully.
Multiplexing and Sample Pooling
Multiplexing allows multiple samples to be sequenced in a single sequencing run, optimizing resources and reducing costs. This process uses specific barcodes (also known as indices) that are added to DNA fragments during library preparation.
Barcodes and Demultiplexing:Â Each sample receives a unique barcode. After sequencing, the data are demultiplexed using these barcodes, allowing the sequences corresponding to each sample to be identified.
Optimizing Coverage:Â The number of samples that can be multiplexed depends on the coverage depth required for each sample. It is important to calculate the optimal number of samples to ensure that each one receives the necessary coverage without compromising data quality.
Applications and Benefits of NGS
Whole Genome Sequencing vs. Targeted Sequencing
NGS can be used to sequence an organism's entire genome or to sequence specific regions of interest, such as exomes or specific gene sets.
Whole Genome:Â Provides a comprehensive view of all the DNA in an organism. It is ideal for exploratory studies or to fully characterize a new genome. However, it is more expensive and generates a large amount of data that requires complex analysis.
Targeted Sequencing:Â Focuses on specific regions, such as the exome (all coding regions of the genome) or a predefined set of genes. It is more economical and is used when there is a specific study goal, such as searching for mutations in known genes.
Identification of SNPs and Structural Variants
NGS is a powerful tool for identifying genetic variations, including small variants such as single nucleotide polymorphisms (SNPs) and larger structural variations, such as insertions, deletions, and duplications.
SNPs:Â SNPs are the most common form of genetic variation and can influence how a gene is expressed or how an individual responds to certain medications.
Structural Variants:Â These include rearrangements of large DNA segments, which can be difficult to detect with short-read technologies but are easier to identify with third-generation technologies due to their long reads.
Uses in Research and Personalized Medicine
NGS has revolutionized personalized medicine, enabling treatments based on an individual's genetic profile. This is particularly important in oncology, where mutations in specific genes can guide the choice of targeted therapies.
Oncology:Â NGS is used to identify mutations in tumors that can be treated with targeted therapies. It also allows monitoring the evolution of cancer and resistance to treatments.
Rare Diseases:Â The ability to sequence entire genomes has facilitated the discovery of genetic variants associated with rare diseases, providing accurate diagnoses and new opportunities for therapy development.
Technical Challenges and Considerations
Limitations of Sequencing Technologies
Despite advances, sequencing technologies have limitations that must be considered when designing an experiment.
Accuracy vs. Long Reads:Â Technologies like Illumina offer high accuracy but with short reads, which may limit the resolution of complex genomic regions. Third-generation technologies offer longer reads but with higher error rates.
Costs:Â Although NGS costs have decreased, they remain high, especially for large-scale projects or those requiring third-generation technology. It is important to balance the budget with scientific goals.
Sequencing Coverage: Importance and Calculation
Sequencing coverage refers to the number of times a particular region of the genome has been sequenced. High coverage is crucial to ensuring that the results are accurate and reproducible.
Coverage Calculation:Â Coverage depends on the depth required to confidently detect variants. For example, in whole genome sequencing, coverage of at least 30x is usually sought to ensure the detection of SNPs and small indels.
Considerations for Multiplexing:Â Multiplexing too many samples can reduce coverage and compromise data quality. It is important to use cost-coverage calculators to determine the optimal number of samples to include in a sequencing run.
Costs and Optimization in Sequencing Projects
The cost of NGS varies depending on the technology used, the scale of the project, and the sequencing depth required.
Cost Reduction Strategies:Â Multiplexing and the use of targeted sequencing can significantly reduce costs without sacrificing data quality. It is also helpful to consider collaborations with other institutions to share the expenses of large sequencing projects.
Bioinformatic Analysis of NGS Data
Read Alignment and Assembly
After sequencing, raw reads must be aligned against a reference genome or assembled de novo if no reference genome is available.
Result Interpretation: Tools and Software
Bioinformatic analysis is a critical part of the NGS workflow. Numerous tools are available, each tailored to different types of analysis.
Variant Calling:Â Detects differences between the sample genome and the reference genome. Software like GATKÂ or FreeBayes are popular for this task.
Functional Annotation:Â Links detected variants to known genes and predicts their potential impact on gene function. Tools like ANNOVAR or SnpEff are commonly used.
Demultiplexing and Data Organization
Demultiplexing:Â This is the process of separating reads by sample using the barcodes assigned during library preparation.
Software such as Illumina’s bcl2fastq allows for the conversion of raw files into individual fastq files for each sample. This step is crucial to ensure that the data for each sample are analyzed correctly.
Data Organization and Storage:Â Given the large volume of data generated by NGS, it is essential to have an efficient data storage and management plan. Cloud storage platforms, such as AWS (which we use at Duponte) or Google Cloud, are common options for managing NGS data.
Challenges in Data Analysis
Data Volume:Â NGS generates vast amounts of data that require powerful computing resources and efficient storage solutions.
Quality Control:Â Ensuring data quality at every step of the process is essential to obtaining reliable results. Quality control tools like FastQC help identify potential problems in sequencing data.
Interpretation:Â The interpretation of NGS results can be challenging, especially when dealing with complex diseases where multiple genetic factors may be involved. Collaboration with clinical geneticists and other specialists is often necessary to make accurate interpretations.
As technology advances, the promise of NGS becomes more concrete but also raises important questions. How will it truly impact our treatments and diagnoses? And what challenges must we overcome to ensure everyone can benefit from these advancements? Don't wait for this technology to become available in your lab—discover how Duponte can offer you these advantages today.
Comments