Phased Tetraploid Potato Genome Download

Phased Genome Assemblies and Annotation of Tetraploid Potato: Atlantic, Castle Russet, Avenger, Altus, Colomba, Spunta

January 11, 2022 - The paper "Phased, chromosome-scale genome assemblies of tetraploid potato reveals a complex genome, transcriptome, and predicted proteome landscape underpinning genetic diversity" by Hoopes et al. has been published in Molecular Plant.

The genome assembly and annotation files are available below. The entire data set for the paper is available on Data Dryad at https://doi.org/10.5061/dryad.3n5tb2rhw.

Atlantic Genome Assembly (v3):

Updated Annotation (v3) for the Phased Assembly of Atlantic Tetraploid Potato

February 17, 2023 - The Buell Lab at the University of Georgia is pleased to make available an updated set of genome annotation for the Atlantic genome assembly (ATL_v3). ATL_v3 was annotated as described in (Hoopes et al. 2022 ) with new RNA-seq libraries and Oxford Nanopore (ONT) cDNA libraries. The methods for processing the new RNA-seq and ONT cDNA libraries as are follows:

RNA-seq libraries were processed for genome annotation by first cleaning with Cutadapt(Martin 2011, v2.10) using a minimum length of 100 nt and quality cutoff of 10 then aligning the cleaned reads to the respective genome using HISAT2(Kim et al. 2019, v2.1.0). Oxford Nanopore (ONT) cDNA reads were processed with Pychopper (v2.5.0; github.com/epi2me-labs/pychopper) and trimmed reads greater than 500 nt were aligned to the respective genome using minimap2(H. Li 2018, v2.17-r941) with a maximum intron length of 5,000 nt. The aligned RNA-seq and ONT cDNA reads were each assembled using Stringtie (Kovaka et al. 2019, v2.2.1) and transcripts less than 500 nt were removed.

Note: the previous Atlantic v2.0 assembly and annotation can be found at Data Dryad at https://doi.org/10.5061/dryad.3n5tb2rhw.

JBrowse Genome Browser

Genome Assembly:

ATL_v3.asm.fa.gz - Genome assembly for Atlantic - v3
ATL_v3.asm.hm.fa.gz - Masked genome assembly for Atlantic - v3
ATL_v3.asm.sm.fa.gz - Soft masked genome assembly for Atlantic - v3

Genome Annotation:

High Confidence Gene Model Set

ATL_v3.hc_gene_models.cdna.fa.gz - Transcript sequences (cDNA) of the high confidence gene models
ATL_v3.hc_gene_models.cds.fa.gz - Coding sequences (CDS) of the high confidence gene models
ATL_v3.hc_gene_models.pep.fa.gz - Protein sequences of the high confidence gene models
ATL_v3.hc_gene_models.gff3.gz - High confidence gene models annotation in GFF3 format

Representative High Confidence Gene Model Set

High confidence representative gene models are a subset of the high confidence gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

ATL_v3.hc_gene_models.repr.cdna.fa.gz - Transcript sequences (cDNA) of the the representative high confidence gene models
ATL_v3.hc_gene_models.repr.cds.fa.gz - Coding sequences (CDS) of the representative high confidence gene models
ATL_v3.hc_gene_models.repr.pep.fa.gz - Protein sequences of the representative high confidence gene models
ATL_v3.hc_gene_models.repr.gff3.gz - Representative high confidence gene models annotation in GFF3 format

Working Gene Model Set

The set of working gene models contains all loci and isoforms from the annotation pipeline and may include artifacts such as partial gene models.

ATL_v3.working_models.cdna.fa.gz - Transcript sequences (cDNA) of the working gene models
ATL_v3.working_models.cds.fa.gz - Coding sequences (CDS) of the working gene models
ATL_v3.working_models.pep.fa.gz - Protein sequences of the working gene models
ATL_v3.working_models.gff3.gz - Working gene models annotation in GFF3 format
ATL_v3.functional_annotation.txt.gz - Functional annotation for the working gene models
ATL_v3.working_models.go_slim.obo.gz - GO Slim annotation for the working gene models

Representative Working Gene Model Set

The representative working gene models are a subset of the working gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

ATL_v3.working_models.repr.cdna.fa.gz - Transcript sequences (cDNA) of the representative working gene models
ATL_v3.working_models.repr.cds.fa.gz - Coding sequences (CDS) of the representative working gene models
ATL_v3.working_models.repr.pep.fa.gz - Protein sequences of the representative working gene models

Castle Russet Genome Assembly:

CR_v2.0_pseudomolecules.fasta.gz - Castle Russet genome assembly with phased and unphased pseudomolecules
CR_v2.0_asm.fasta.gz - Castle Russet genome assembly with phased pseudomolecules, unphased pseudomolecules, and unplaced scaffolds
CR_v2.0_unplaced_scaffolds.fasta.gz - Castle Russet genome assembly with unplaced scaffolds
Potato_POR06V12_Phased_scaffolds.fasta.gz - Castle Russet DeNovoMagic^TM genome assembly with phased scaffolds
Potato_POR06V12_Phased_Unplaced_scaffolds.fasta.gz - Castle Russet DeNovoMagic^TM genome assembly with unphased scaffolds
CR_v2.0_asm.hm.fasta.gz - Repeat masked Castle Russet v2.0 genome assembly
CR_v2.0_asm.sm.fasta.gz - Soft repeat masked Castle Russet v2.0 genome assembly

Castle Russet Genome Annotation:

Castle Russet v2.0 Pseudomolecules - High Confidence Gene Model Set

cr.hc.pm.cdna.fasta.gz - Transcript sequences (cDNA) of high confidence gene models
cr.hc.pm.cds.fasta.gz - Coding sequences (CDS) of high confidence gene models
cr.hc.pm.pep.fasta.gz - Protein sequences of high confidence gene models
cr.hc.pm.locus_assign.gff3.gz - High confidence gene models annotation in GFF3 format

Castle Russet v2.0 Pseudomolecules - Representative High Confidence Gene Model Set

High confidence representative gene models are a subset of the high confidence gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

cr.hc.repr.pm.pep.fasta.gz - Protein sequences of the representative high confidence gene models
cr.hc.repr.pm.locus_assign.gff3.gz - Representative high confidence gene models annotation in GFF3 format

Castle Russet v2.0 Pseudomolecules - Working Gene Model Set

The set of working gene models contains all loci and isoforms from the annotation pipeline and may include artifacts such as partial gene models.

cr.working_models.pm.cdna.fasta.gz - Transcript sequences (cDNA) of working gene models
cr.working_models.pm.cds.fasta.gz - Coding sequences (CDS) of working gene models
cr.working_models.pm.pep.fasta.gz - Protein sequences of working gene models
cr.working_models.pm.locus_assign.gff3.gz - Working gene models annotation in GFF3 format
cr.pm.functional_annotation.txt.gz - Functional annotation for the working gene models

Castle Russet v2.0 Pseudomolecules - InterPro and GO term assignment

cr.pm.iprscan_output.tsv.gz - InterProScan search results for the working gene models
cr.pm.iprscan_go_terms.txt.gz - InterProScan assigned GO terms for the working gene models
cr.pm.gene_models.go.txt.g - GO terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)
cr.pm.gene_models.go_slim.txt.gz - Plant GOSlim terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)

Avenger Genome Assembly:

Avenger_NRQ-11019_Fully_phased_scaffolds.fasta.gz - Avenger DeNovoMagic^TM genome assembly with phased scaffolds
Avenger_NRQ-11019_Unphased_scaffolds.fasta.gz - Avenger DeNovoMagic^TM genome assembly with unphased scaffolds
avenger_phased_asm.hm.fa.gz - Repeat masked Avenger genome assembly
avenger_phased_asm.sm.fa.gz - Soft repeat masked Avenger genome assembly

Avenger Genome Annotation:

Avenger Phased Scaffolds - High Confidence Gene Model Set

avenger.gene_models.hc.cdna.fa.gz - Transcript sequences (cDNA) of high confidence gene models
avenger.gene_models.hc.cds.fa.gz - Coding sequences (CDS) of high confidence gene models
avenger.gene_models.hc.pep.fa.gz - Protein sequences of high confidence gene models
avenger.gene_models.hc.gff3.gz - High confidence gene models annotation in GFF3 format

Avenger Phased Scaffolds - Representative High Confidence Gene Model Set

High confidence representative gene models are a subset of the high confidence gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

avenger.gene_models.hc.repr.pep.fa.gz - Protein sequences of the representative high confidence gene models
avenger.gene_models.hc.repr.gff3.gz - Representative high confidence gene models annotation in GFF3 format

Avenger Phased Scaffolds - Working Gene Model Set

The set of working gene models contains all loci and isoforms from the annotation pipeline and may include artifacts such as partial gene models.

avenger.working_models.cdna.fa.gz - Transcript sequences (cDNA) of working gene models
avenger.working_models.cds.fa.gz - Coding sequences (CDS) of working gene models
avenger.working_models.pep.fa.gz - Protein sequences of working gene models
avenger.working_models.gff3.gz - Working gene models annotation in GFF3 format
avenger.functional_annotation.txt - Functional annotation for the working gene models

Avenger Phased Scaffolds - InterPro and GO term assignment

avenger.iprscan_output.tsv.gz - InterProScan search results for the working gene models
avenger.iprscan_go_terms.txt.gz - InterProScan assigned GO terms for the working gene models
avenger.gene_models.go.txt.gz - GO terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)
avenger.gene_models.go_slim.txt.gz - Plant GOSlim terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)

Altus Genome Assembly:

Altus_Fully_phased.scaffolds.fasta.gz - Altus DeNovoMagic^TM genome assembly with phased scaffolds
Altus_Unphased.scaffolds.fasta.gz - Altus DeNovoMagic^TM genome assembly with unphased scaffolds
altus_phased_asm.hm.fa.gz - Repeat masked Altus genome assembly
altus_phased_asm.sm.fa.gz - Soft repeat masked Altus genome assembly

Altus Genome Annotation:

Altus Phased Scaffolds - High Confidence Gene Model Set

altus.gene_models.hc.cdna.fa.gz - Transcript sequences (cDNA) of high confidence gene models
altus.gene_models.hc.cds.fa.gz - Coding sequences (CDS) of high confidence gene models
altus.gene_models.hc.pep.fa.gz - Protein sequences of high confidence gene models
altus.gene_models.hc.gff3.gz - High confidence gene models annotation in GFF3 format

Altus Phased Scaffolds - Representative High Confidence Gene Model Set

High confidence representative gene models are a subset of the high confidence gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

altus.gene_models.hc.repr.pep.fa.gz - Protein sequences of the representative high confidence gene models
altus.gene_models.hc.repr.gff3.gz - Representative high confidence gene models annotation in GFF3 format

Altus Phased Scaffolds - Working Gene Model Set

The set of working gene models contains all loci and isoforms from the annotation pipeline and may include artifacts such as partial gene models.

altus.working_models.cdna.fa.gz - Transcript sequences (cDNA) of working gene models
altus.working_models.cds.fa.gz - Coding sequences (CDS) of working gene models
altus.working_models.pep.fa.gz - Protein sequences of working gene models
altus.working_models.gff3.gz - Working gene models annotation in GFF3 format
altus.functional_annotation.txt - Functional annotation for the working gene models

Altus Phased Scaffolds - InterPro and GO term assignment

altus.iprscan_output.tsv.gz - InterProScan search results for the working gene models
altus.iprscan_go_terms.txt.gz - InterProScan assigned GO terms for the working gene models
altus.gene_models.go.txt.gz - GO terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)
altus.gene_models.go_slim.txt.gz - Plant GOSlim terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)

Colomba Genome Assembly:

Colomba_fully_phased.scaffolds.fasta.gz - Colomba DeNovoMagic^TM genome assembly with phased scaffolds
Colomba_Unphased.scaffolds.fasta.gz - Colomba DeNovoMagic^TM genome assembly with unphased scaffolds
colomba_phased_asm.hm.fa.gz - Repeat masked Colomba genome assembly
colomba_phased_asm.sm.fa.gz - Soft repeat masked Colomba genome assembly

Colomba Genome Annotation:

Colomba Phased Scaffolds - High Confidence Gene Model Set

colomba.gene_models.hc.cdna.fa.gz - Transcript sequences (cDNA) of high confidence gene models
colomba.gene_models.hc.cds.fa.gz - Coding sequences (CDS) of high confidence gene models
colomba.gene_models.hc.pep.fa.gz - Protein sequences of high confidence gene models
colomba.gene_models.hc.gff3.gz - High confidence gene models annotation in GFF3 format

Colomba Phased Scaffolds - Representative High Confidence Gene Model Set

High confidence representative gene models are a subset of the high confidence gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

colomba.gene_models.hc.repr.pep.fa.gz - Protein sequences of the representative high confidence gene models
colomba.gene_models.hc.repr.gff3.gz - Representative high confidence gene models annotation in GFF3 format

Colomba Phased Scaffolds - Working Gene Model Set

The set of working gene models contains all loci and isoforms from the annotation pipeline and may include artifacts such as partial gene models.

colomba.working_models.cdna.fa.gz - Transcript sequences (cDNA) of working gene models
colomba.working_models.cds.fa.gz - Coding sequences (CDS) of working gene models
colomba.working_models.pep.fa.gz - Protein sequences of working gene models
colomba.working_models.gff3.gz - Working gene models annotation in GFF3 format
colomba.functional_annotation.txt - Functional annotation for the working gene models

Colomba Phased Scaffolds - InterPro and GO term assignment

colomba.iprscan_output.tsv.gz - InterProScan search results for the working gene models
colomba.iprscan_go_terms.txt.gz - InterProScan assigned GO terms for the working gene models
colomba.gene_models.go.txt.gz - GO terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)
colomba.gene_models.go_slim.txt.gz - Plant GOSlim terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)

Spunta Genome Assembly:

Spunta_NRQ_11022_fully_phased_scaffolds.fasta.gz - Spunta DeNovoMagic^TM genome assembly with phased scaffolds
spunta_NRQ_11022_unphased_scaffolds.fasta.gz - Spunta DeNovoMagic^TM genome assembly with unphased scaffolds
spunta_phased_asm.hm.fa.gz - Repeat masked spunta genome assembly
spunta_phased_asm.sm.fa.gz - Soft repeat masked spunta genome assembly

Spunta Genome Annotation:

Spunta Phased Scaffolds - High Confidence Gene Model Set

spunta.gene_models.hc.cdna.fa.gz - Transcript sequences (cDNA) of high confidence gene models
spunta.gene_models.hc.cds.fa.gz - Coding sequences (CDS) of high confidence gene models
spunta.gene_models.hc.pep.fa.gz - Protein sequences of high confidence gene models
spunta.gene_models.hc.gff3.gz - High confidence gene models annotation in GFF3 format

Spunta Phased Scaffolds - Representative High Confidence Gene Model Set

High confidence representative gene models are a subset of the high confidence gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

spunta.gene_models.hc.repr.pep.fa.gz - Protein sequences of the representative high confidence gene models
spunta.gene_models.hc.repr.gff3.gz - Representative high confidence gene models annotation in GFF3 format

Spunta Phased Scaffolds - Working Gene Model Set

The set of working gene models contains all loci and isoforms from the annotation pipeline and may include artifacts such as partial gene models.

spunta.working_models.cdna.fa.gz - Transcript sequences (cDNA) of working gene models
spunta.working_models.cds.fa.gz - Coding sequences (CDS) of working gene models
spunta.working_models.pep.fa.gz - Protein sequences of working gene models
spunta.working_models.gff3.gz - Working gene models annotation in GFF3 format
spunta.functional_annotation.txt - Functional annotation for the working gene models

Spunta Phased Scaffolds - InterPro and GO term assignment

spunta.iprscan_output.tsv.gz - InterProScan search results for the working gene models
spunta.iprscan_go_terms.txt.gz - InterProScan assigned GO terms for the working gene models
spunta.gene_models.go.txt.gz - GO terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)
spunta.gene_models.go_slim.txt.gz - Plant GOSlim terms assigned to the working gene models by best hit to the Arabidopsis proteome (TAIR10)

This work is supported by grants from the National Science Foundation (IOS- 2140176), U.S. Department of Agriculture (2019-51181-30021), and funds from the Georgia Research Alliance, Georgia Seed Development, and University of Georgia.