133 SEQUENCING AND ANNOTATION OF THE GENOME OF THE HOLSTEIN COW
R. Lilleoja A , E. Reimann A B , Ü. Jaakma A and S. Köks A BA Estonian University of Life Sciences, Tartu, Estonia;
B University of Tartu, Tartu, Estonia
Reproduction, Fertility and Development 24(1) 179-179 https://doi.org/10.1071/RDv24n1Ab133
Published: 6 December 2011
Abstract
This paper presents the preliminary results of whole genome resequencing of the Holstein cow using the SOLiD 4 System. The aim of this study was to obtain a high-quality Holstein cow genome reference sequence, which could be used as a reference for genomic studies on the Estonian Holstein cattle. Furthermore, the new reference sequence would be made available for other research groups. We generated one mate-paired library and one fragment library from 30 μg of genomic DNA. Libraries were sequenced in 4 flow cells. Colour space fasta files (.csfasta) and appropriate quality files (.qual) were mapped and paired to the reference cow (Bos taurus) genome assembly from Oct. 2007 (Baylor 4.0/bosTau4). Mapping and pairing was performed using the Max Mapper algorithm implemented in the Bioscope Software (version 1.3). Initial sequencing resulted in the 2 842 744 008 fifty-basepair reads. Average mapping efficiency with mismatch penalty –2.00 and clearzone 5 was 73.3%. Altogether 2 065 066 215 reads and 92 778 710 937 bp were successfully mapped, resulting in 35.2 coverage. Pairing indicated that the insert range was 665 to 2195 bp and mean insert size was 1363 bp. Tertiary analysis found 5 472 870 SNP in the cow genome; 3 517 351 were heterozygous and 1 955 519 were homozygous variants. Also, 3 747 199 were transition SNP and 1 093 307 were transversion SNP, with a transition-transversion ratio of 2.17:1.00. Annotation revealed that only 889 901 of all discovered SNP were annotated in the SNP database dbSNP. This means that around 4 582 969 SNP were novel. The number of large indels was 144 035, out of which 68 817 were heterozygous and 75 218 were homozygous variants. The longest deletion was 15 089 bp and there were 18 deletions between 10 000 and 20 000 bp. The largest insertion range was 1000 to 5000 bp and there were 358 insertions falling into this span. Interestingly, the most numerous group of deletions was between 200 and 500 bp and between 100 and 200 bp. Altogether, in these size groups there were 114 578 deletions. Large indels variations accounted for 48 582 675 bp of the entire genome. Analysis of the small indel polymorphisms identified 452 113 small indels, out of which 287 491 were heterozygous and 164 622 were homozygous. Only 1197 small indels were listed in the dbSNP. Most of the small indels were single nucleotide insertions/deletions (261 897). Small indels accounted for the total variation of 1 722 303 nucleotides in the genome. Finally, we identified 287 inversions (largest 151 000 bp) in the genome of the cow. In conclusion, the genome of the cow contains huge amounts of still unknown variations. Better knowledge of these variations could explain significant phenotypic differences (e.g. reproduction) between different breeds.
The European Regional Development Fund together with the Archimedes Foundation, target finance grant from the Ministry of Education and Science SF1080045s07, grant from the Estonian University of Life Sciences P8001 and Estonian Science Foundation grant GARFS7479 supported this study.