Biomedical Genetics

Throughout my Honors Biomedical Genetics class, I engaged in many in-depth discussions with my professor in and out of class. We discussed deeper questions about the topics covered in lecture, such a ethics of genetics, and topics that are relevant to the current world of genetics, such as genetic modifications.

As the Honors Project for this class, we were assigned a DNA sequence. In addition to learning the fundamentals of genetics, I was given an unidentified set of nucleotides. I learned to sequence the nucleotides to discover what species they belonged to (they were a Mus Musculus (house mouse), beta hemoglobin sequence) and then I was able to create a gene therapy approach to modify the mutation I found in the assigned sequence.


Part 1 of Honors Project:

Name of the locus to which assigned DNA sequence would be found:

My assigned DNA sequence locus name is hemoglobin, beta adult t chain. I found this by aligning my sequence with the refseq_rna database in BLAST.

Most likely species of origin:

The most likely species of origin for my sequence is Mus musculus (house mouse). I ran my unknown DNA sequence through the refseq_rna BLAST and chose the sequence with 100% Percent Identity; the accession page for that sequence had the species name listed.

What chromosome & What coordinates:

My sequence is found on chromosome 7. My accession page listed the chromosome but did not list the coordinates of the sequence. However, when I ran my sequence along the refseq_genome, I was able to find the location of the sequence on the genome: Range 1: 103461631 to 103463230, and Range 2: 103475783 to 103477226. I used reference genome assembly through the refseq_genome database in BLAST.

Gene family:

My sequence is a member of the beta-globin gene family (β-globin loci). Genes within a gene family have similar biochemical functions. The hemoglobin subunits are made up of 10 genes separated into two clusters on different chromosomes, the α-globin and β-globin loci.

For the formation of a functional hemoglobin molecule, it requires two products, one product from each of the gene families. The products include two β-globin family polypeptides and 2 α-globin polypeptides, along with 4 small heme groups. 

Sense or template stand?

My sequence is the template strand. When I aligned my DNA sequence with the Mus musculus mRNA sequence, it resulted in a Plus/Minus strand. The template strand is minus, and the sense strand is plus.

Exons:

My sequence has 3 exons. I was able to identify the number of exons on my sequence by running it through the refseq_genome database in BLAST.

The location of the exons in the mRNA sequence that matched my DNA sequence are: Exon: 1…146, Exon: 147…369, and Exon: 370…630.

Part 2:

Identify (in bp) the intron-exon boundaries and spices sites of the gene in your assigned sequence:

The intron-exon boundaries and splices sites were as I expected except for the beginning of the 2nd exon. I found two mutations at the beginning of the 2nd exon. First, the splice site has a substitution, the AA directly before the 2nd exon should be an AG.

The result of this mutation would be the loss of the 2nd exon. The splice site at the end of the first exon would go to the beginning of the third exon, resulting in a loss of the second exon which would result in a severely altered protein-coding sequence.

Second, there is an insertion of 4 nucleotides at the beginning of the 2nd exon. This mutation results in a shifted reading frame and the addition of an extra codon.

The translated exons (with the insertion removed), match the translation on the Accession Page of Mus Musculus (NM_008220.5).

CTGCAGGGTAACACCCTGGCATTGGCCAATCTGCTCAGAGAGGACAGAGTGGGCAGGAGCCAGCATTGGGTATATAAAGCTGAGCAGGGTCAGTTGCTTCTTACGTTTGCTTCTGATTCTGTTGTGTTGACTTGCAACCTCAGAAACAGACATCATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGTCTCTGGCCTGTGGGGAAAGGTGAACGCCGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGTATCCAGGTTACAAGGCAGCTCACAAGTAGAAGTTGGGTGCTTGGAGACAGAGGTCTGCTTTCCAGCAGGCACTAACTTTGAGTGTCCCCTGTCTATGTTTCCCTTTTTAAGCTGCTGGTTGTCTACCCTTGGACCCAGCGGTACTTTGATAGCTTTGGAGACCTATCCTCTGCCTCTGCTATCATGGGTAATGCCAAAGTGAAGGCCCATGGCAAGAAAGTGATAACTGCCTTTAACGATGGCCTGAATCACTTGGACAGCCTCAAGGGCACCTTTGCCAGCCTCAGTGAGCTCCACTGTGACAAGCTGCATGTGGATCCTGAGAACTTCAGGGTGAGTCTGATGGGCACCTCCTGGGTTTCCTTCCCCTGGCTATTCTGCTCAACCTTCCTATCAGAAGGAAAGGGGAAGCGATTCTAGGGAGCAGTCTCCATGACTGTGTGTGGAGTGTTGACAAGAGTTTGGATATTTTATTCTCTACTCAGAATCGCTGCTCCCCCTCACTCTGTTCTGTGTTGTCATTTCCTCTTTCTTTGGTAAGCTTTTAATTTCCAGTTGCATTTTACTAAATTAATTAAGCTGGTTATTTACTTCCCATCCTGATATCAGCTTCCCCTCCTCCTTTCCTCCCAGTCCTTCTCTCTCTCCTCTCTCTTTCTCTAATCCTTTCCTTTCCCTCAGTTCATTTCTTCTTCTTTGATCTACTTTTGTTTGTCTTTTTAAATATTGCCTTGTAACTTACTCAGAGGACAAGGAAGATATGTCCCTGTTTCTTCTCATAGCTCTCAAGAATAGTAGCATAATTGGCTTTTATGCCAGGGTGACAGGGGAAGAATATATTTTACATATAAATTCTGTTTGACATAGGATTCTTATAATAATTTGTCAGCAGTTTAAGGTTGCAAACAAATGTCTTTATAAATAAGCCTGCAGTATCTGGTATTTTTGCTCTACAGTTATGTTGATGGTTCTTCCATCTTCCCACAGCTCCTGGGCAATATGATCGTGATTGTGCTGGGCCACCACCTGGGCAAGGATTTCACCCCCGCTGCACAGGCTGCCTTCCAGAAGGTGGTGGCTGGAGTGGCTGCTGCCCTGGCTCACAAGTACCACTAAGCCCCTTTTCTGCTATTGTCTATGCACAAAGGTTATATGTCCCCTAGAGAAAAACTGTCAATTGTGGGGAAATGATGAAGACCTTTGGGCATCTAGCTTTTATCTAATAAATGATATTTACTGTCATCTCAATTCTGTGTTTTGATTACTTTTGTTTCTTGCAAGGATTAATGTGAAATATTTATTATATAAAGCAGTTGGGGCATGCTGGAGGGAAGGAAGTGAGGGTAAA

*PURPLE = the A is a mutation that should be a G

*RED = GCTG is an insertion

mRNA for assigned sequence:

TTACGTTTGCTTCTGATTCTGTTGTGTTGACTTGCAACCTCAGAAACAGACATCATGGTGCACCTGACTGATGCTGAGAAGGCTGCTGTCTCTGGCCTGTGGGGAAAGGTGAACGCCGATGAAGTTGGTGGTGAGGCCCTGGGCAGGTTGGCTGCTGGTTGTCTACCCTTGGACCCAGCGGTACTTTGATAGCTTTGGAGACCTATCCTCTGCCTCTGCTATCATGGGTAATGCCAAAGTGAAGGCCCATGGCAAGAAAGTGATAACTGCCTTTAACGATGGCCTGAATCACTTGGACAGCCTCAAGGGCACCTTTGCCAGCCTCAGTGAGCTCCACTGTGACAAGCTGCATGTGGATCCTGAGAACTTCAGGCTCCTGGGCAATATGATCGTGATTGTGCTGGGCCACCACCTGGGCAAGGATTTCACCCCCGCTGCACAGGCTGCCTTCCAGAAGGTGGTGGCTGGAGTGGCTGCTGCCCTGGCTCACAAGTACCACTAAGCCCCTTTTCTGCTATTGTCTATGCACAAAGGTTATATGTCCCCTAGAGAAAAACTGTCAATTGTGGGGAAATGATGAAGACCTTTGGGCATCTAGCTTTTATCTAATAAATGATATTTACTGTC

SNPs have been identified in this sequence:

I found 43 SNPs in my sequence.

Purifying or diversifying selection?

The gene is under diversifying selection because there were more mismatch SNPs than synonymous SNPs.

I used 10 sequences to find SNPs in my mRNA sequence. When I compared the original mRNA sequence with the mRNA sequence modified with the SNPs found, there were 43 SNPs. When I translated the sequences, I found 24 nonsynonymous proteins.

Synonymous to Nonsynonymous

        19 : 24

Original (non-modified) mRNA Protein:

LRLLLILLC-LATSETDIMVHLTDAEKAAVSGLWGKVNADEVGGEALGRLLVVYPWTQRYFDSFGDLSSASAIMGNAKVKAHGKKVITAFNDGLNHLDSLKGTFASLSELHCDKLHVDPENFRLLGNMIVIVLGHHLGKDFTPAAQAAFQKVVAGVAAALAHKYH-APFLLLSMHKGYMSPREKLSIVGK–RPLGI-LLSNK-YLLSSQKKKKK

Modified mRNA Protein:

LRWLLSLFC-LATSETDIMVHLTDAEKSAVSCLWAKVNPDAIGGEALGRLLVVYPWAQRYFDSFGDLSSASAIMGNPKVKAHGKKVITAFNEGLKNLDNLKGTFASLSELHCDKLHVDPDNFRLLGSAIVIVLGHLLGKDFTPDAQAAFQKVVAGVATALAHKYH-AHFLLLSMHKGYMFPREKLSSVGK–RPLGI-LLSNK-YLLSSQKKKKK

Blue = Same Proteins Translated

Yellow = Different Proteins Translated

Do orthologous genes occur in other species of animals?

An orthologous gene is a gene found in different species because it has been passed down from previous common ancestors by speciation. Yes, orthologous genes occur in other species of animals. Yes, it occurs in all species of animals, including vertebrate and invertebrate. For example, hemoglobin occurs in all the kingdoms of living organisms.

Phylogenetic tree for all paralogous members of the gene family present in the reference human genome:

I used Clustal Omega to make a phylogenetic tree for the 4 hemoglobin subunits in homo sapiens (see below).

Phylogenetic tree for the orthologous sequences:

I used Clustal Omega to create a phylogenetic tree with orthologous sequences (see below). I used the species of Elephant Shark (cartilaginous fish), Perch (bony fish), Salamander (amphibian), Garter Snake (reptile), Chicken (bird), Human (eutherian mammal), Mouse (eutherian mammal), and Opossum (marsupial mammal).

IMAGE

Part 3:

Final: