https://www.thermofisher.com/sg/ ... -ngs-forensics.html|
https://ioncommunity.thermofishe ... neration-sequencing
lobSTR: A short tandem repeat profiler for personal genomes
[size=15.9991px]Short tandem repeats (STRs) have a wide range of applications, including medical genetics, forensics, and genetic genealogy. High-throughput sequencing (HTS) has the potential to profile hundreds of thousands of STR loci. However, mainstream bioinformatics pipelines are inadequate for the task. These pipelines treat STR mapping as gapped alignment, which results in cumbersome processing times and a biased sampling of STR alleles. Here, we present lobSTR, a novel method for profiling STRs in personal genomes. lobSTR harnesses concepts from signal processing and statistical learning to avoid gapped alignment and to address the specific noise patterns in STR calling. The speed and reliability of lobSTR exceed the performance of current mainstream algorithms for STR profiling. We validated lobSTR's accuracy by measuring its consistency in calling STRs from whole-genome sequencing of two biological replicates from the same individual, by tracing Mendelian inheritance patterns in STR alleles in whole-genome sequencing of a HapMap trio, and by comparing lobSTR results to traditional molecular techniques. Encouraged by the speed and accuracy of lobSTR, we used the algorithm to conduct a comprehensive survey of STR variations in a deeply sequenced personal genome. We traced the mutation dynamics of close to 100,000 STR loci and observed more than 50,000 STR variations in a single genome. lobSTR's implementation is an end-to-end solution. The package accepts raw sequencing reads and provides the user with the genotyping results. It is written in C/C++, includes multi-threading capabilities, and is compatible with the BAM format.
STRait Razor: a length-based forensic STR allele-calling tool for use with second generation sequencing data.
Recent studies have demonstrated the capability of second generation sequencing (SGS) to provide coverage of short tandem repeats (STRs) found within the human genome. However, there are relatively few bioinformatic software packages capable of detecting these markers in the raw sequence data. The extant STR-calling tools are sophisticated, but are not always applicable to the analysis of the STR loci commonly used in forensic analyses. STRait Razor is a newly developed Perl-based software tool that runs on the Linux/Unix operating system and is designed to detect forensically-relevant STR alleles in FASTQ sequence data, based on allelic length. It is capable of analyzing STR loci with repeat motifs ranging from simple to complex without the need for extensive allelic sequence data. STRait Razor is designed to interpret both single-end and paired-end data and relies on intelligent parallel processing to reduce analysis time. Users are presented with a number of customization options, including variable mismatch detection parameters, as well as the ability to easily allow for the detection of alleles at new loci. In its current state, the software detects alleles for 44 autosomal and Y-chromosome STR loci. The study described herein demonstrates that STRait Razor is capable of detecting STR alleles in data generated by multiple library preparation methods and two Illumina® sequencing instruments, with 100% concordance. The data also reveal noteworthy concepts related to the effect of different preparation chemistries and sequencing parameters on the bioinformatic detection of STR alleles.
Using DNA evidence in solving crime cases is something just about everyone is familiar with, publicized in crime dramas such as the CSI ([size=13.3333px]Crime Scene Investigation) television program. [size=13.3333px]Yet as a technique for human identification, a [size=13.3333px]Restriction Fragment Length Polymorphism (RFLP) marker was first described in 1980, and shortly after the Polymerase Chain Reaction in 1985, with [size=13.3333px]Variable Nucleotide Tandem Repeat
(VNTR) markers described in 1986 by Sir Alec Jeffries, (He would later describe Short Tandem Repeats in 1991.) The national database in the US had its origins in 1988, and the UK [size=13.3333px]National DNA Database
[NDNAD] in 1995. In 1998 the Federal Bureau of Investigation (FBI) combined several pre-existing databases into the [size=13.3333px]Combined DNA Index System
[CODIS], which has since then amassed [size=13.3333px]profiles of over 11 million
in the US. The NDNAD in the UK has over 5 million records. These DNA profiling databases do not have names or personal identifiers, only the DNA profile, an agency profile that submitted that profile, a specimen identification number, and the DNA laboratory personnel involved with the DNA analysis. (More information about [size=13.3333px]CODIS and the National DNA Index System
The DNA profile itself consists of a standard set of human identification markers, which at present are [size=13.3333px]Short Tandem Repeats (otherwise known as STRs). A single STR allele may only be shared in a given population of individuals from 5 to 20%; that is, on any given allele there is a 1:20 to 1:5 probability of identity between two random individuals. In the US CODIS system 13 STR markers are used, while in the UK a set of 11 compatible STR markers are used (a subset), upon which a set of legal precedents have been established. The DNA profile record is one or two alleles at each of the 13 or 11 loci.
Applied Biosystems instruments and reagents have been trusted for over 20 years for many types of genetic analysis, and has a comprehensive line of products for [size=13.3333px]STR-based forensics. If you are interested in further background and applications of STR analysis, here’s a recent post that describes [size=13.3333px]microsatellite discovery using next-generation sequencing. And yet there are unmet needs where a next-generation sequencing approach can help – in particular, where the starting material has been fragmented (for example in mass disaster situations), shorter amplicons afforded by single-nucleotide polymorphism detection can be much more effective where an STR amplification requires input DNA that is intact enough to amplify a larger amplicon.
Of course, by ‘larger amplicon’ it can be very large – on the order of kilobases in length, depending on the locus chosen. And as far as degraded samples go, the ability to have a multiplexed PCR assay with amplicons only 150 or 200 bases in length for SNP-based identification means a much higher assay success rate.
Additionally, when a crime scene sample does not match any of the existing databases, knowing the ancestry or other phenotypic traits of the owner may be helpful in generating investigative clues. Also, mixed samples are very frequently observed from crime scene samples; certain trace DNA samples from hangun triggers have been known to contain DNA from up to 7 individuals! This is where next-generation sequencing may offer benefit to de-convolute the identity of these mixed samples.
To this end an [size=13.3333px]HID-Ion AmpliSeq™ Identity Panel has been developed with 124 markers, comprised of 34 Y-clade SNPs and another 90 autosomal SNPs. According to that webpage, the multiplexed assay offers “high discrimination power comparable to the 13 CODIS core STR loci”. In addition, an [size=13.3333px]HID-Ion AmpliSeq™ Ancestry Panel has also been developed with 165 markers to help generate investigative leads.
One recent publication, “Single nucleotide polymorphism typing with massively parallel sequencing for human identification” in the [size=13.3333px]International Journal of Legal Medicine, used an earlier version of the Ion AmpliSeq™ Identity Panel (called v0.1), with 103 autosomal and 33 Y-SNPs. Four samples were tested at varying amounts of input DNAs, from the recommended 10ng down to 100pg, using the [size=13.3333px]Ion Torrent PGM™ System and Ion 314™ Chip. Interestingly this group looked at a cross-platform comparison, and conclude that “Overall, the data support that genotyping a large battery of SNPs is feasible with massively parallel sequencing”.