RAD-Seq for Genome Wide Association Studies

Dear NGS Expert Blog reader,

To kick off the 2014 discussion on RAD Sequencing for the NGS blog, I wanted to share some results from a recently published study describing the use of RAD for high throughput SNP genotyping in Miscanthus.

miscanthusThe tropical grass Miscanthus is an intriguing candidate for bioenergy crop development: It is well adapted to grow in environments worldwide, does not require intensive agricultural efforts to cultivate and is capable of producing large amounts of biomass. To illustrate this point, cultivars of Miscanthus giganteus are capable of growing over 3.5 meters in a single year! With such promise as a bioenergy solution, a number of research groups are working on modernizing breeding efforts in Miscanthus and integrating genomic technologies to help develop superior varieties.

Our group at Floragenex assisted in one recent published study, which illustrates how RAD sequencing was able to facilitate the rapid generation of sizeable molecular resources to aid in a genome wide association study (GWAS). The goal in a GWAS study is to identify a set of genetic variants that tend to be associated with specific traits that are observed in natural, unstructured populations. Some interesting highlights from this paper:

RAD-Seq was able to identify over 100,000 single nucleotide variants (SNVs) across 138 Miscanthus plants. The large number of markers is advantageous for association studies, where understanding the organization of the genome at high resolution is key.

Without an assembled Miscanthus genome, we accomplished variant calling with a two-pronged approach

  • a comparative genomics strategy using the Sorghum bicolor genome as a reference and
  • a de novo clustering approach using the Miscanthus RAD data.

Both were successful methods for high quality SNV discovery and genotyping.

After filtering, approximately 20,000 and 30,000 high quality markers, respectively, were genotyped across the Miscanthus population using the two approaches. After genotyping was complete, the comprehensive genome wide association analysis described in the paper showed statistically significant marker-trait associations for seven key Miscanthus treats, including lignin content, plant moisture and stem diameter. These traits are important for bioprocessing of plant material and the results suggest marker-assisted and genome selection studies could be effective tools in Miscanthus breeding.

The full article, entitled “Genome-wide association studies and prediction of 17 traits related to phenology, biomass and cell wall composition in the energy grass Miscanthus sinensis” can be found at New Phytologist:  http://www.ncbi.nlm.nih.gov/pubmed/24308815

As a co-author on this exciting publication, I would be happy to answer any of your questions on this paper, so do not hesitate to post them. For my next post, I will be comparing many of the new fractional sequencing technologies being utilized for NGS genotyping.

Rick Nipper,
President, Floragenex

Rick Nipper

About Rick Nipper

Rick stands for RAD-Seq. He is our guest expert blogger from Floragenex.

2 Responses to “RAD-Seq for Genome Wide Association Studies”

  1. Dear Rick Nipper,

    Nice article. I was wondering, how do you pick which restriction enzyme is best for you? How do you make sure you do not get cut sites in repetitive regions?

    • Rick Nipper

      Dear Peter,

      Thank you for your question. With RAD-Seq, the optimal enzyme is selected based on the application and desired outcome. For example, in the GWAS paper above, we needed a large number of markers, so an enzyme that cuts frequently in the genome (PstI) was used. For studies where fewer markers are needed, we will select an enzyme that will digest the genome less frequently, and produce a reduced amount of genomic sequence. A good enzyme example is SbfI which was used in this 2010 paper.

      Regarding repeat elements, use of methylation sensitive restriction enzymes (such as PstI, SbfI, SgrAI, NotI) helps focus genome digestion and sequencing to low-copy, gene-rich regions of most plant genomes. This helps us avoid most of the repetitive content when sequencing.

      Hope that helps!