生物信息学 高通量癌症研究

更新时间:2023-06-11 08:05:01 阅读量: 实用文档 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

基于高通量测序技术的癌症研究

林钊linzhao@

Cancer Background

CACER GENOMICSn

Cancers are caused by changes that have occurred in the DNA sequence of the genomes of cancer cells

n

Characteristic:The high heterogenicity in the different cancer tissue,different developing period

Target:a comprehensive catalogue of somatic mutations cancer samples identification of further potentially druggable cancer genes utility of somatic mutations as biomarkers for prognosis

ü

ü

hypothesis-driven

data-driven, large scale analysis

Problems and difficulties of classical methods Unable to detect rare variants,MAF>5%.

Rare SNPs were true diseases risk variants. Classical methods have just looked at cancer cells and sequenced genes known or suspected to be linked to cancer,it may overlooked key mutations, especially new ones. Hypothesis genes chosen, long cycle time and low successful rate.

All these can be solved by sequencing

It’s time to sequencing!

MR Stratton et al. Nature 458, 719-724 (2009) 7

Overview of Cancer SolutionsExome sequencingResearch design 100 tumor and 100 control 50X/sample

Whole genome sequencing10 groups (blood+ tumor tissue) 30X per sample

Cell line

Single-cell sequencing50X exome of 20 normal and 100 tumor single cells;

whole genome sequencing 50X 170800bp PE; 20X 2k40kbp PE;

Deliverable s

find SNV, Indel

find SNV, find SNV, indel, indel CNV,SV,Viru s integrations or rearrangements

find SNV,SV, novel squence by assembly

Cancer Solution 1: Exome squencing

100 tumor and 100 control> 50X/sampleBackground:ØØ

The high heterogenicity in the same cancer tissueRequire hundreds of cases to be sequenced to identify a cancer gene that is mutated in

Scientific goal:ØØ

To detect the most of the somatic mutationsTry to Identify drive and passenger9

Analysis PipelineExome Sequencing:>50×depth

Alignment with SOAPaligner

SNVs detected by SNVdetector or other softwares

Indels (short reads)

Alignment to reference genome

Quality control

Indels detected by SoapSV or other softwares Filtering out indels in normal tissues Excluding indels in dbSNP/YH/1000 genomes

Potential somatic SNVs Excluding SNVs in dbSNP/YH/1000 genomes

Somatic mutations

Somatic indels

Sequencing Data ProductionNormal Sequencing analysis Total effective reads(M) Total effective yield(Mb) Effective sequence on target(Mb) Average sequencing depth on target Coverage of target region Tumor Sequencing analysis Total effective reads(M) Total effective yield(Mb) Effective sequence on target(Mb) Average sequencing depth on target Coverage of target region GC-201 GC-202 GC-203 GC-204 GC-205 GC-206 GC-207 GC-208 GC-209 GC-210 40.08 37.04 32.16 32.7 37.62 35.96 32.1 37.15 34.95 44.38 GC-201 GC-202 GC-203 GC-204 GC-205 GC-206 GC-207 GC-208 GC-209 GC-210 11.7 11.76 11.75 11.83 21.44 12.19 12.46 21.02 21.52 9.33

856.08 861.88 808.88 823.36 1558.41 899.66 915.95 150

9.57 1558.94 746.08 334.59 302.87 290.43 281.15 550.05 318.27 321.69 529.31 549.31 293.7 9.81 8.88 8.51 8.24 16.13 9.33 9.43 15.52 16.1 8.61

92.7% 90.8% 91.8% 92.5% 94.3% 93.2% 92.0% 94.3% 94.6% 92.2%

2930.61 2831.84 2395.21 2433.29 2864.62 2728.28 2381.05 2823.45 2644.37 3550.2 1075.9 971.22 824.17 851.02 1040.93 986.48 865.74 1024.37 995.13 1397.8 31.54 28.47 24.16 24.95 30.52 28.92 25.38 30.03 29.18 40.98

95.5% 94.8% 94.8% 95.1% 95.0% 95.2% 94.6% 95.0% 95.3% 95.5%

Schematic diagram of SNVs filtering process and gene annotation8277 somatic SNVs

7517 present in dbSNP and 1000 genome project760 (9.2%) new SNVs 346 synonymous and UTR’s SNVs 414 (54.5%)nonsynonymous and splice-site SNVs

357 predicted cancer genes 113 recorded in COSMIC 244 novel predicted cancer genes

249 random select SNV for technical validation

216 (86.7%)validated

SNV profile

SNV spectrum

SNVs location

Transcription factor network in 3 pathways

The expression alteration of MUC17Patients with varied MUC17 were represented good prognostic comparing with ones of wild-type MUC17

Cancer solution 2: Whole Genome Sequencing10 groups (blood/normal tissue+tumor tissue) sample 30X per

Background:u

Need to know the whole aspect of genomics,including intro、 promotor region to find mutations

Research:Large-scale analyses of genes in tumors have shown that the mutation load in cancer is abundant, hetero-geneous, and widespread

WorkflowDNA sample prepration Library construction HiSeq 2000 sequencing Alignment Basic bioinformatics analysis Advanced bioinformatics analysis

Short InDel calling

SNV calling SNV annotation

CNV calling

SV calling

InDel annotation

CNV annotation

SV annotationPersonalized bioinformatics analysis

Demographic analysis

Selection

Others

Others

Mutations Summary

Cancer solution 3: cell line Introduction: Human immortal cancer cell lines--an accessible, easily usable set of biological models

Advantage:1.give out very clear pattern about what happened in that cell line.2.build a systematic characterization of the genetics and genomics 3.High-accuracy SV,CNV, information/clear pattern 21

本文来源:https://www.bwwdw.com/article/91o1.html

Top